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Abstract 

In this work we design a general method for proving moment inequalities for polynomials 
of independent random variables. Our method works for a wide range of random variables 
including Gaussian, Boolean, exponential, Poisson and many others. We apply our method to 
derive general concentration inequalities for polynomials of independent random variables. We 
show that our method implies concentration inequalities for some previously open problems, e.g. 
permanent of random symmetric matrices. We show that our concentration inequality is stronger 
than the well-known concentration inequality due to Kim and Vu pi] . The main advantage of 
our method in comparison with the existing ones is a wide range of random variables we can 
handle and bounds for previously intractable regimes of high degree polynomials and small 
expectations. On the negative side we show that even for boolean random variables each term 
in our concentration inequality is tight. 
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1 Introduction 

Concentration and moment inequalities are vital for many applications in Discrete Mathematics, 
Theoretical Computer Science, Operations Research, Machine Learning and other fields. In the 
classical setting we have n independent random variables X\ , . . . , X n and we are interested in a 
behavior of a function f(X\, . . . ,X n ) of these random variables. Probably, the first concentration 
inequality with exponential bounds for tails was proven by S. Bernstein [jll] who showed that if 
Xi are random variables that take values +1 or —1 with probability 1/2 (i.e. Rademacher random 
variables) then 
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More general inequalities known as Chernoff Bounds became part of the mathematical jargon to the 
extent that many papers in Theoretical Computer Science use them without stating the inequalities. 
In the last 20 years this area of Probability Theory and related area of mathematics studying the 
measure concentration has nourished driven by the variety of applications and settings. The surveys 

, p4| provide the historical and mathematical background in this area. 



and books 22, 17, 13 



The most general and powerful methods known up to date to prove such inequalities is Ledoux's 
entropy method 34] and the famous Talagrand's isoperimetric inequality |45[| . Yet as was noticed by 



Vu [49] these methods and corresponding inequalities work well only when the Lipschitz coefficients 
of the function f(X±, . . . ,X n ) are relatively small. The standard example showing the weakness 
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of such methods is the number of triangles in random graphs G(n,p). Until the concentration 



inequality due to Kim and Vu [ 31 1 no non-trivial concentration of this function about its mean was 
known. 



Kim and Vu |3l| introduced the notion of average Lipschitz coefficients based on the partial deriva- 
tives of a polynomial evaluated at the point (E[Ai], . . . , E[X n ]) (in the multilinear case). These new 
parameters enabled them to prove a concentration inequality for polynomials of boolean random 
variables. This inequality has been applied to the problem of approximately counting triangles in 
(e.g.) a social network by sampling the edges [47, 48], to average-case correlation clustering [37|, 



and to a variety of other applications [49]. The original inequality from J3l| was tightened and 



generalized in [Ej] to handle arbitrary random variables in the interval [0, 1] . Yet the inequality 



from [49] did not work well for high degree polynomials and for random variables f(X±, . . . ,X n 



with small expectation. The follow up work by Vu [50] handles the case of polynomials with small 
expectation and extremely small smoothness parameters. 

On the other side the concentration of polynomials of Gaussian and Rademacher random variables 
has long been a subject of interest in Probability Theory. The moment and concentration inequali- 



ties for polynomials of centered Gaussians are known as Hypercontractivity Inequalities [42, 24]. We 
discuss various inequalities known in this setting and their connection to our results in Section 1.5. 
Recently, the Hypercontractivity Inequalities and their "anti-concentration" counterparts found 
many applications in Theoretical Computer Science and Machine Learning |K], [D| f[|, |(J 0> H- 

The above motivated us to study the moment and concentration inequalities for polynomials of 
independent random variables. We design a general method that works for a wide range of random 
variables including Gaussian, Boolean, exponential, Poisson and many others (see Section for more 
examples). We show that our method implies concentration inequalities for some previously open 
problems, e.g. permanent of random symmetric matrices. We also show that our main concentration 



inequality is stronger than the well-known concentration inequality due to Kim and Vu [31]. On 
the negative side we show that even for boolean random variables each term in our concentration 
inequality is tight. 

1.1 Our Results 

For a cleaner exposition we first describe our results in the restricted setting of multilinear polyno- 
mials with non-negative coefficients. We are given a hypergraph H = (V(H),7i(H)) consisting of 
a set V(H) = {1, 2, . . . , n} = [n] of vertices and a set 7i(H) of hyperedges. A hyperedge h is a set 
h C V{H) of \h\ < q vertices. We are also given a non-negative weight Wh for each h € T~L(H). For 
each such weighted hypergraph and real- valued weight Wh for its hyperedges, we define a polynomial 

f(x)= ™hl[x v . (1.1) 

h£H(H) veh 

Our smoothness parameters were strongly motivated by the average partial derivatives introduced 



by Kim and Vu [31, fl9| ]. For any y £ M. n , hypergraph H, nonnegative weights w, and ho C V{H) 
let 

n{y,H,w,h Q )= ^2 w h II \Vv\- 

heH(H) I hDho v£h\ho 

Note that h$ need not be a hyperedge of H and may even be the empty set. Also note that 
[i(y , H , w , ho) is equal to the |/io|-th partial derivative of polynomial f(x) with respect to each 
variable x v for v G ho, evaluated at the point x = y if y G R 1 }. For a given collection of independent 
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random variables Y = (Y"i, . . . ,Y n ), hypergraph H, integer r > and nonnegative weights w, we 
define 



\x r = fJ. r {H, w) = max E [yu(Y, H, w, ho)] = max N, w h E[|K„ 

h ° eW:|/l0|=r h C[n]:\ho\=r 

where we used the independence of random variables Y v in the last equality. Sometimes we will 
also use the notation fi r (f) = fj, r (H,w). Note that when the Y v are non-negative fx r is equal to the 
maximal expected partial derivative of order r of the polynomial f(x), which was the parameter 



used in the Kim-Vu concentration inequalities [31, [49| . 



Our concentration inequalities will hold for a general class of independent random variables includ- 
ing most classical ones. 

Definition 1.1 A random variable Z is called moment bounded with parameter L > if for any 

integer i > 1, 

E[\Z\ l ] < i -L-EflZP" 1 ] . 

Roughly speaking a random variable Z is moment bounded with parameter L if E [\Z\] < L and the 
tails of its distribution decay no slower than an exponentially distributed random variable's tails 
do. Indeed note that Definition |1.1| implies that any moment bounded random variable Z satisfies 
E [\Z\ l ] < Ui\. In Section ^ we show that three large classes of random variables are moment 
bounded: bounded, continuous log-concave || [| and discrete log-concave ||. For example the 
Poisson, binomial, geometric, normal (i.e. Gaussian), and exponential distributions are all moment 
bounded. 

We prove the following: 

Theorem 1.2 We are given n independent moment bounded random variables Y = (Y\, . . . ,Y n ) 
with the same parameter L. We are given a multilinear polynomial f(x) with nonnegative coeffi- 
cients of total poweif\ q. Let f(Y) = f(Y\, . . . , Y n ) then 

Pr [\f(Y) — E[f(Y)]\ > A] < e 2 • max < max e~m»r-v-m ^ max ^{j^wrm) I 

I r=l,...,q r=l,...,q 

where R > 1 is some absolute constant. 

We also show that Theorem is the best possible bound as a function of these parameters, up 
to logarithms in the exponent and dependence of the constants on the total power q. This lower 
bound holds even for the well-studied special case where the random variables take the values 
and 1 only, which we show in Section ^ to be moment bounded with parameter 1. 

Theorem 1.3 For any q G N, real numbers fJ-Q,^*, ...,//*> and A > there exist independent 
0/1 random variables X = Xi, . . . ,X n and a polynomial f(x) of power q such that Hi(f) < //* for 
all < i < q and 

2 +i)\o S C -((*) 1/r +i)\o g c' s 



Pr [f(X) > E [f(X)] + A] > max max { e J ,e J } (1.2) 



r=l 



where C = coA^A^Ag 3 , cq, c\, C2 and C3 are absolute constants, Ai = maxo<i i j< (? (/i|/^*)' 3 , A2 
maxi<i< g X/fJ,*> an d A3 = q q ■ 



We reserve the more traditional terminology of "degree" for the number of neighbors of a vertex in a hypergraph. 
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We generalize Theorem [E^ in two ways. Firstly, we allow negative coefficients. Secondly, we remove 
the restriction for a polynomial to be multilinear, instead allowing each monomial to have total 
power at most q and maximal power of each variable at most T. For example X\X^X\ has total 
power q = 7 and maximal variable power T = 4 and the multilinear case is when maximal power is 
r = 1. We defer the formal definition of general polynomials and the appropriate generalization of 



/i r to Section 1.4. 



Our main result in this paper is the following: 

Theorem 1.4 We are given n independent moment bounded random variables Y = (Y±, . . . ,Y n ) 
with the same parameter L. We are given a general polynomial f(x) of total power q and maximal 
variable power V. Let f(Y) = f(Y\, . . . , Y n ) then 



A 



1/r 



Pr [\f(Y) - E[/(Y)]| > A] < e • max < max e w*"'<' •'" •"« , max e" 

\r=l,...,q r=l,...,q 

where R > 1 is some absolute constant. 



For large power polynomials the concentration bounds in the Theorem 1.4 may not provide inter- 
esting concentration bounds due to the term R q in the exponent, yet we believe that the moment 
computation method developed in this paper is useful even in this setting. We show two specific 
examples when our method works. Our first example is a concentration inequality for permanents 
of random matrices. The anti-concentration counterpart was recently studied by Aaronson and 
Arkhipov [Q] in the Gaussian setting and by Tao and Vu in the setting with Rademacher 
random variables. 

Theorem 1.5 We are given nxn matrix A with random entries Yij which are independent moment 
bounded random variables with parameter L = 1 and EfKy] = 0. Let P(A) be the permanent of the 
matrix A then 

Pr[\P{A)\ >tVri.} <max{e- n ,e 2 -e- c - t2/n } 
for some absolute constant c > and parameter t > 0. 

Our next example is an analogous Theorem for the permanent of a random symmetric matrix. 

Theorem 1.6 We are given nxn symmetric matrix A with random entries Yij which are inde- 
pendent moment bounded random variables for all pairs with i < j with parameter L = 1 and 
E[Yjj] = 0. Let P(A) be the permanent of the matrix A then 

Pr[\P(A)\ >tVn\] <max{e- n ,e 2 -e- c - t2/n } 
for some absolute constant c > and parameter t > 0. 

Note that the above concentration inequalities can be easily derived from the Hypercontractivity 



Inequality in the special case of Gaussian and Rademacher random variables (Theorem 1.9 ) 



1.2 Applications in Randomized Rounding for Mathematical Programming Prob- 
lems 

As we noted all current methods to prove concentration bounds for polynomials do not work well 
for high power polynomials. Another feature that makes current concentration methods fail is low 
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expectation. One application where such concentration bounds could be applied is in design and 
analysis of randomized rounding algorithms for non-linear mathematical programming problems. 

Many real-life optimization problems can be formulated using integer programming which is well- 
known to be computationally intractable (NP-hard). One way to solve such a problem both in 
theory and practice is to consider a linear programming relaxation, solve it using one of the standard 
methods and use the fractional optimal solution as a guidance in finding an integral solution of 
good quality. The seminal paper of Raghavan and Thompson |^3| suggested to round each boolean 
variable to one with probability x* and to zero with probability 1 — x* independently at random 
where x* is the optimal fractional solution. The analysis of such algorithms is based on applying 
Chernoff Bounds to each constraint of the integer program separately and then applying a union 
bound over all the constraints. Such a method proved to be useful for a wide range of models and 
led to approximation algorithms that still have best known performance guarantees today. 

A natural generalization of this framework is to apply it to non-linear optimization models. Many 
such problems are still computationally tractable if we replace the constraint that variables must 
be boolean X{ S {0, 1} with continuous constraints < x* < 1, e.g. quadratic convex constraints. 
There are many real-life optimization problems with constraints and objective functions modeled in 
such a way, e.g. we would like to optimize a congestion for a group of edges in a multi-commodity 
flow problem in a "fair" way, i.e. we don't want to have one edge to get significantly higher 
congestion than the other. The standard way to ensure that in practice is to optimize (or constrain) 
sum of squared congestions over the edges in a that group. The constraints generated this way 
are convex quadratic constraints and continuous optimization problems with such constraints are 
polynomially solvable. 

To analyze the randomized rounding framework for such mathematical programming models one 
needs to apply concentration inequality to each non-linear constraint. If the size of the group of 
edges for which we are trying to optimize the total congestion in a fair way is sub-logarithmic 
and each edge in the fractional solution has a constant congestion (a situation quite natural from 
application viewpoint) then our concentration inequalities would be the only available tool to 
analyze such an algorithm. 

1.3 Sketch of Our Methods 

Most concentration results for non- negative random variables are proven using Markov's inequality 
as follows: 

Pr [Z > X] = Pr [g(Z] > g(X)} < (1.3) 

5(A) 

where Z is the random variable that we are trying to show concentration of and g is either g{z) = z k 
for some positive even integer k, g(z) = e tz for some real t > 0, or some other non-negative increasing 
function g. One then computes an upper bound on either the /cth moment E [g(Z)] = E [Z fc ] or 
the moment generating function E [#(Z)] = E [e ] . Chernoff bounds are proven using moment 
generating functions, so it would be most natural to use moment generating functions to prove our 
bounds as well. Unfortunately the tails of the distribution of polynomials can be sufficiently large 
to make the moment generating function E [e tz ] infinite for all t > 0. Kim and Vu worked around 
this issue by applying (|1.3| ) not to the polynomial itself but to various auxilliary random variables 
with better behaved tails. Unfortunately a union bound over these auxiliary variables introduced 



an extraneous factor logarithmic in the number of variables into their bounds (see Section L5 for 



a comparison of our results to theirs). We avoid this issue by computing moments instead of the 
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moment generating function. 

We now give an instructive half-page bound on the second moment of a multilinear polynomial 
f(X) = Ylh&H w h Ylveh Xv where all E [X v ] = 0, X v are moment bounded with parameter L, and 
all h S 7i have \h\ = q and Wh > 0. Using definitions, linearity of expectation, and independence 
we get 



E [f{Xf] = E 



£ y, n y « n y - 



= e e n e M ( l4 ) 

h 1 £Hh 2 eH v&(h 1 Uh 2 ) 

where d v £ {1,2} is the number of hi B v. Now if d v = 1 for any v we have E [y*-] = E [Y v ] = 0, 
so the only non-zero terms of the sum ( |1.4| ) are when /ii =hi- We therefore get 



E [/w 2 ] = E«Il E [^ 2 ] 
< E wll( 2L]E o y «i]) 

=(2L)v g E^n E [i y -i] 

= (2L)V 9 Mo- (1.5) 

where we used the fact that E [y^ 2 ] < 2LE [\Y V \] from moment boundedness and Wh < max/, iu/j = fj, q 
from the definition of n q . Combining ( |1.5| ) with Markov's inequality ( |1.3D yields 

Pr[|/(X)|>A]< / 



which is comparable to the e a2 /(^om<?(R^) 9 ) term in Theorem for small A. In order to get 
exponentially better bounds for larger A we will compute higher moments. 

Now we outline what we do differently to handle higher moments and general polynomials. 

The first step is to express polynomial / over variables Y v as a sum of polynomials , . . . , g( m ^ 
over variables YJ — E [YJ] for various 1 < r < q. The main task is bounding the moments of each 
of these polynomials. We later combine these bounds to get a bound on the moment of /. Each 
of the centered polynomials has E [YJ — E [YJ]] = 0, which takes the place of the E [Y v ] = in the 
above special case. We also ensure that each gW has non-negative weights. 

Bounding moments of some gi begins by expanding E \_g^ \ similar to ( |1.4| ) with a sum over hi, . . . , h/.. 
As before only terms of the sum where every vertex v occurs in d v > 2 different hyperedges are 
non-zero, but this is no longer equivalent to the simple condition h\ = /i2- 

We find it helpful to separate the structure of the hyperedges h\ , . . . , h^ from the identity of the 
variables involved. We therefore generate hi,...,h^ by composing two processes: first generate 
hi, ■ ■ ■ ,hf. over vertex set [£] for every I > 1 and then consider every possible embedding of those 
artificial vertices into the vertex set [n]. For a fixed sequence of hyperedges over vertex set [£] we 
do arguments analogous to ( |l.5[) to get a product of various /U, and L. This bound is a function of 
the number of connected components c in hi, . . . ,hk- Finally we do some combinatorics to prove 



G 



a counting lemma on the number of possible hi, ■ ■ ■ , hk with vertex set [£] with all degrees at least 
two and c connected components. 

One additional complication is that we need to use moment boundedness to bound moments of 
order much larger than the second moments E [Y^ 2 ] < 2LE [\Y V \] we used in the above special case. 
If we treated the factor that replaces that 2 as a constant that would make the constant R in our 
final bounds linear in q instead of an absolute constant. Fortunately these extra factors are small 
for most of the possible hi, . . . , hk, which enables our counting lemma to absorb these extra factors. 

Our lower bounds are based on lower-bounding the concentration of certain concrete polynomials. 
It is well known that Chernoff bounds are essentially tight, i.e. a sum of n i.i.d. 0/M random 
variables each with expected value /x/n has probability roughly e~ x /PuM) of exceeding its mean 
by A < /i. Our lower bound of e~°^ A2 /^ oMr ^ follows from a degree q polynomial that acts like 
this linear polynomial with M = \i T and fj, = fj,Q. The idea behind the lower bound corresponding 
to e -W**r) 1/r is the fact that Pr Xi) r > A] = Pr [^Xi > A 1 /'] = e -®( Al/r ) where J2i X i is 
binomially distributed with mean 1. Our lower bound does similar arguments with a multilinearized 
version of (J2i^iY- 

1.4 Definitions 

We now state the generalizations of the notations given in the introduction for general polynomials. 

A powered hypergraph H consists of a set V(H) of vertices and a set T-L(H) of powered hyperedges. 
A powered hyperedge h consists of a set V (h) C V(H) of | V (h) \ = rj(h) vertices and an ?7(/i)-element 
power vector r(h) with one strictly positive integer component r(h) v = Th v per vertex v G V(h). 
We will hereafter omit the "powered" from "powered hypergraph" and "powered hyperedge" since 
we have no need to refer to the basic hypergraphs used in the introduction. For any powered 
hyperedge h we let q(h) = Yl v eV(h) T hv- F° r each such powered hypergraph H and real- valued 
weights Wh for its hyperedges, we define a polynomial 

f(x)= ^ w h J] x?". (1.6) 

h£H{H) veV(h) 

The hyperedge h corresponds to a monomial Y\v€h x V lv - ^ ne parameters q(h) and rj(h) will be 
called the total power and cardinality of the hyperedge h (or monomial corresponding to h). Let 
T = max hgM (^) Th v be the maximal power of a variable in polynomial f(x), e.g. T = 1 for 
multilinear polynomials. We assume, by convention, that Hie0 Xi = ^" Since the variables in our 
polynomials are indexed by vertices in our hypergraphs we use the terms "variable" and "vertex" 
interchangeably. 

For powered hyperedges hi and hi (not necessarily hyperedges of a hypergraph) we write hi y hi 
if V(hi) ^ VQ12) and Th lV = Th 2V for all v £ V{hi). In the context of hypergraph H with vertex 
set [n] clear from context, for a given collection of independent random variables Y = (Yi, . . . ,Y n ), 
integer r > and weights w we define 

» r (w,Y)= max f V |^| TT E[|Y^|]| (1.7) 

where ho ranges over all possible powered hyperedges with vertices from [n] with total power 
q(ho) = r. The cardinality of ho is not explicitly restricted but it cannot exceed r since the 
powers Th oV are strictly positive integers summing to q(ho). We will sometimes write [i r (f,Y) for 
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polynomial f(Y) instead of fi r (w, Y) to emphasize the dependence on polynomial f(Y). If we write 
Hr(f) for a polynomial / this means fJ, r (w, Y) for the weight function w and random variable vector 
Y corresponding to / as in ( |1.6|) . If the polynomial is clear from context we write simply /j, r . In 
the special case that all coefficients are non-negative fi r is upper bounded by the maximal expected 
partial derivative of order r of the polynomial f(x), and this bound is loose for two reasons. First 
we do not have multipliers that depend on powers that are present in derivatives. Second we 
throw away some positive terms that are present in derivatives since we enforce t^ v = 7> lQ . u for all 
v G V(ho), whereas derivatives would consider all h with Th v > Th v For example for the polynomial 
YqYi + Y± of non-negative random variables we have fj,2 = 1 while the maximal expected second 
partial derivative is equal to 

max {2 + 6E [V 3 ] E , 6E [Y ] E [Y?] , 9E [V 2 ] E [if] } . 

Overall, our definition of smoothness is a bit tighter (although less natural) than the partial deriva- 



tives used in [31]. Ej| that inspired it. We decided to use it since it naturally arises in our analysis. 



1.5 Comparison with Known Concentration Inequalities 

There are many concentration inequalities dealing with the case when we are interested in a sum 
of weakly dependent random variables. The paper p9] provides a good survey and comparison of 
various inequalities for that setting. Below we will survey only known concentration inequalities for 
the case of polynomials of independent random variables. The previous works were dealing either 
with the case of boolean random variables, variables distributed in the interval [0,1], Gaussian 
random variables or log-concave random variables. 

1.5.1 Comparing with the Kim-Vu inequality 

Probably the most famous concentration inequality for polynomials is due to Kim and Vu |HJ pub- 
lished in 2000. There are many variants, extensions and equivalent formulations of that inequality. 
We consider a variant from the survey paper by Vu p9| (Theorem 4.2 in Section 4.2). 



Theorem 1.7 (Kim-Vu Concentration Inequality) Consider a polynomial f(Y) = f(Y\, . . . ,Y n 
with coefficients in the interval [0,1]. We denote &Af{Y) a polynomial obtained from f(Y) by tak- 
ing partial derivatives with respect to A where A is a multiset of indices probably with repetitions. 
Let Y\, . . . ,Y n be independent random variables with arbitrary distributions on the interval [0,1]. 
Let q be the degree of polynomial f(Y) and Kj[f(Y)] = max\A\>j'E[dAf(Y)]. Assume we are given 
an integer q' < q and a collection of positive numbers £q > E\ > • • • > E q > = 1 and A satisfying 

1. £ j >E j [f(Y)]forj = 0,...,q'; 

2. Sj/Sj+i > A + 4j log nforj = 0,...,q'- 1; 

then the following holds 

Pr[\f(Y) - E[f(Y)]\ > c ?v ^] < d q e- x ' 4 
where c q ~ q q l 2 and d q = 2 q+l — 2 (see precise definitions in J^/ j. 

This stronger version of the original Kim-Vu inequality |H ] has dependence on parameter q' which 
could be helpful for some applications (see discussion in |49|1). We compare below our inequality 



with the inequality in Theorem 1.7 when q' = q which includes the original Kim-Vu inequality [31] 
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and is the most relevant variant in terms of various applications (our inequality does not seem to 
be comparable with the general version of the Theorem [O]) . 



Re-writing our inequality from Theorem 1.4 in the same form we could derive 

■ \f(Y) -E[f(Y)]\> max max { Vr/^/v • L r • T r • Ri, r r /z r • U ■ T r ■ R q } 



< e 2 ■ e~ T 



instead of the bound in the Theorem |1.4| for any r > 0. Using the properties of bounds £j we derive 
So > X^¥,j[f(Y)] and £\ > \i Ej[f(Y)]. In addition, as we already noticed, our definition of 
smoothness is tighter than the one based on partial derivatives, i.e. £j > E,-[/(Y)] > //,-. Therefore, 



i > max max I \J A//oAV> A r /x r \ 

r=l,...,q I J 



Choosing r = A/4, we obtain that the concentration inequality of Theorem L4 implies the inequality 
from Theorem |L7] (we don't explicitly specify the relationship between our absolute constant R 
and constants used in the definition of c q and d q ). We list below the various ways our inequality 



generalizes or tightens the inequality from Theorem 1.7 



1. The bounds in our inequality do not depend on the total number of random variables n while 
all variants of the Kim-Vu inequality have this dependence due to the usage of the union 
bound in their proof. 

2. Our inequality covers a much wider range of random variables, including most commonly 
used ones not just the variables distributed in the interval [0, 1]. 

3. Our definition of smoothness while being related to (and strongly motivated by) the smooth- 
ness based on partial derivatives is tighter and for some applications involving polynomials 
with large V will provide a better concentration bound. 

4. Our bounds have a better dependence on the degree of the polynomials. We also introduce a 
parameter T that is a maximal power of a variable in a polynomial which leads to substantially 
tighter bounds for the most important special case of multilinear polynomials. 



Another concentration inequality that appeared in the literature is due to Boucheron et al. [14] 
(Section 10). 

Theorem 1.8 Consider a multilinear degree q polynomial f(Y) = /(Yi, . . . ,Y n ) of the independent 
boolean random variables Y\, . . . , Y n . Then 

>? \ 1/r I x \ 1/r ) 



Pr[f(Y) > E[/(Y)] + A] < e 8 «max max e VisTWW ) max e V 4 ^r 

\r=l,...,q r=l,...,q 

for some absolute constant R > 0. 



The second term in the maximum looks very similar to ours in the Theorem 1.4 but the first term 



is substantially higher due to the power 1/r. Also their inequality does not seem to generalize to 



general class of random variables considered in Theorem 1.4. Note that the Theorem 1.8 is just a 



corollary of a moment inequality proved for much more general functions than polynomials. 
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1.5.2 Gaussian and Rademacher Random Variables 



Another class of known concentration inequalities deals with the case when random variables are 
either centered (or zero mean) Gaussians or variables that have value +1 or —1 with probability 1/2 
(such random variables are often called Rademacher random variables). The history of moment and 
concentration inequalities in this setting is quite rich, we refer the reader to the Lecture 16 in Ryan 
O'Donnell Lecture Notes on Boolean Analysis |^] or the book by S. Janson [^4|] (Sections V and 
VI). We will call the moment and corresponding concentration inequalities the Hypercontractivity 



Inequalities for the formal proofs see Theorems 6.7 and 6.12 in [24]. 



Theorem 1.9 (Hypercontractivity Concentration Inequality) Consider a degree q polyno- 
mial f(Y) = f(Y\, . . . ,Y n ) of independent centered Gaussian or Rademacher random variables 
Y± , . . . , Y n . Then 

I a 2 \ 1/q 

p r[\f(Y) -E[f{Y)}\ > A] < e 2 • e ~V^W)lJ ; 
where Var[f(Y)] is the variance of the random variable f(Y) and R > is an absolute constant. 

It is well-known that functions of Gaussian random variables are better concentrated around their 
mean than for example functions of Boolean random variables even in such a simple case as a sum of 
independent random variables. Therefore, in general we cannot expect to match the bound of The- 
orem |1.9| in the setting of moment bounded random variables. Nevertheless, if M2 ~ max rg [ 9 ] HofJ> r 
(e.g. it happens when power q = 0(1), the polynomial is multilinear, fj, r = O(l) for r £ [q — 1] and 



Wh € {0, 1} for all hyperedges h) and A < n$ then Theorem 1.4 provides a better concentration 
bound even in this setting. 

An interesting concentration inequality for degree q polynomials of centered Gaussian random 



variables was recently proven by R. Latala [32]. This inequality generalizes the previously known 
inequalities for the case when q = 2 pq| . The papers by Major |36| and Lehec |35| simplify and 
explain Latala's proof. Latala uses certain smoothness parameters that seem to be natural only in 
the setting of continuous random variables. We do not see the way to define similar smoothness 
parameters in the setting of general moment bounded (or even boolean) random variables. 

1.5.3 Log-Concave Random Variables 

We define log-concave random variables in the Section ^ and give many examples of such variables. 
Latala and Lochowski |33| consider the setting with non-negative log-concave random variables and 
multi-linear polynomials. Recently, Adamczak and Latala || considered symmetric log-concave 
random variables and polynomials of degree at most three and symmetric exponential random 
variables (or variables having Laplace distribution) for polynomials of arbitrary degree. The main 
drawback of their approach in [33] is that they estimate tails of random variables instead of esti- 



mating the deviation from the mean which is required in most applications. We can show that their 
smoothness parameters can be derived from ours fi r (and the tail bounds) in the case of exponential 
random variables or any random variables that are tight for our moment boundness condition, i.e. 
E [\X^] « iLE [l-Xf- 1 ]. 

1.6 Other Concentration Inequalities for Polynomials 

Another line of attack on understanding the concentration of polynomials is to use the structure of 
polynomials and some smoothness parameters analogous to the partial derivatives or our parameters 
fjL r . Many of these known inequalities provide tight upper and lower bounds for moments but 
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involve hard to estimate smoothness parameters. In the case of Gaussian random variables tight 
concentration bounds for polynomials of independent random variables were obtained by Hanson 
and Wright [28] for and q = 2 and Borell |l2|], Arcones and Gine R for q > 3. In the case of 



Rademacher random variables analogous results but based on different methods were obtained by 



Talagrand [44] for q = 2 and Boucheron et al. [14] for q > 3. Adamczak ||J proved a concentration 



inequality for general functions of a general class of random variables. All these results except 



and [44] (i.e. the case of quadratic polynomials) use parameters that involve expectations 
of suprema of certain empirical processes that are in general not easy to estimate which limits 
applicability of these inequalities. 

Another interesting class of inequalities was obtained by using the so-called "needle decomposition 
method" in the field of Geometric Functional Analysis. It is a rich research area and we refer the 
reader to the survey paper by Nazarov, Sodin and Volberg j|l| . An interesting moment inequality 
which seem to generalize and tighten many previously known inequalities in this area was shown 
by Carbery and Wright |l(| (see Theorem 7). It implies the following concentration inequality via 
application of Markov's inequality 

Theorem 1.10 Consider a degree q polynomial f(Y) = f(Y±, . . . , Y n ). Assume that random vari- 
ables Y±, . . . ,Y n are distributed according to some log-concave measure in R n (i.e. they are not 
necessarily independent). Then 



Pr[\f(Y)-E[f(Y)]\>X]<e 2 -e 
for some absolute constant R > 0. 



1/9 



On one side this inequality is extremely general and allows to study such processes as sampling 
a point uniformly from the interior of a polytope in R n . On the other side, due to its generality 
this inequality is weaker than ours even in a simple case of the sum of n independent exponential 
random variables (q = 1). In this case our concentration inequality gives Chernoff bounds like 
estimates while Theorem |1.10| provides a much weaker bound. Another drawback of the Theorem 
l.lOj that it does not handle discrete distributions. 



1.7 Paper Outline 

We now outline the rest of this paper. 

In Section [2] we state and prove several lemmas about the moments of "centered" polynomials 
that form the heart of our results. In Section || we extend these lemmas to moments of arbitrary 
polynomials. In Section || we use these Lemmas to prove our main Theorem 1.4. In Section [| 



we prove a counting lemma used in Section g. We prove our permanent Theorems and |1.6| in 
Section ^. We prove our lower bound Theorem [II] in Section || We conclude with examples of 
moment bounded random variables in Section 0. 

In Appendix |A| we prove a special case of our main result: the linear case q = 1, i.e. concentration 
of a sum of independent moment-bounded random variables. This linear case of our theorem is not 
new, but the proof nicely illustrates many of our techniques with minimal technical complications. 
The interested reader may find it helpful to study the special cases in Section 1.2 and Appendix [A] 
before reading the main body of this paper. 
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2 Moment Lemma for Centered Polynomials 



The proof of the Theorem 1.4 will follow from the application of the Markov's inequality to the 
upper bound on the k-th moment of the polynomial in question. The first step is to look at moments 
of "centered" polynomials that replace YJ hv with (YJ~ hv — E [5 / u T ' 11 ']). For simplicity the heart of our 
analysis will assume that all coefficients Wh are non-negative; negative coefficients will return in 
Section |3[ 

Lemma 2.1 (Initial Moment Lemma) We are given a hypergraph H = ([n],T-L), n independent 
moment bounded random variables Y = (Yi, . . . ,Y n ) with the same parameter L and a polynomial 

g{y) = J2 w hU^ V ~ E l Y v hv ]) 

hen v£h 

with nonnegative coefficients > such that every monomial ( or hyperedge ) h £ % has cardinality 
exactly r\, total power exactly q and maximal power upper bounded by T, i.e. q(h) = q, n(h) = r\ 
and r > max„ 6 /j t^v ■ It follows that for any integer k > 1 we have 



E 



< m&x^RfL qk ~ A -T qk ~ A -k qk ^ 1 > u °- A - 




where R2 > 1 is some absolute constant, A = X^t=o( < ? ~~ t) v t> and the maximum is over all non- 
negative integers v t , < t < q satisfying uq < v q , Ylt=o u t = k and qk — (q — l)z/o — A > 1 (note also 
that nt are defined according to 7^ , i.e. they depend on original (not centered) random variables 
Y v for v G [n]). 

Proof. Fix hypergraph H = ([n],7i), random variables Y = (Y\, . . . ,Y n ), non- negative weights 
{wh}heH, integer k > 1, cardinality r\ and total power q. Without loss of generality we assume 
that Ti is the complete hypergraph (setting additional edge weights to as needed), i.e. T~L includes 
every possible hyperedge over vertex set [n] with cardinality ij, total power q, and maximal power at 
most T. A labeled hypergraph G = (V(G),"H(G)) consists of a set of vertices V(G) and a sequence 
of k (not necessarily distinct) hyperedges %{G) = hi, . . . , h/.. In other words a labeled hypergraph 
is a hypergraph whose k hyperedges are given unique labels from [k]. We write e.g. T\heH(G) w h 

as a shorthand for Y\i=i w h t where rl(G) = h\, . . . , hk', in particular duplicate hyperedges count 
multiple times in such a product. 

Consider the sequence of hyperedges hi,...,hj- G % from our original hypergraph H. These 
hyperedges define a labeled hypergraph H(h\, . . . , hk) with vertex set U k =l V(hi) and hyperedge 
sequence h±, . . . , h^. Note that the vertices of H(h±, . . . , h^) are labeled by the indices from [n] and 
the edges are labeled by the indices from [k]. Note also that some hyperedges in H(hi, ■ ■ ■ , h^) could 
span the same set of vertices and have the same power vector, i.e. they are multiple copies of the 
same hyperedge in the original hypergraph H. Let V{H, k) be the set of all such edge and vertex 
labeled hypergraphs that can be generated by any k hyperedges from H. We say that the degree of 
a vertex (in a hypergraph) is the number of hyperedges it appears in. Let V2(H, k) C V(H, k) be 
the set of such labeled hypergraphs where each vertex has degree at least two. We split the whole 
proof into more digestible pieces by subsections. 

2.1 Changing the vertex labeling 

In this section we will show how to transform the formula for the k-th moment to have the summa- 
tion over the hypergraphs that have its own set of labels instead of being labeled by the set [n] . Let 
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Xhv = YJ hv — E [Y u r '"'] for h G H and v G h. By linearity of expectation, independence of random 
variables Xhv for different vertices v G V and definition of V(H, k) we obtain 



E 



E 

h 1 ,...,h k &-L 



E 



ii ( ^ n x ^ 

«6V(/h) 



i=l 



E E 

GeV{H,k) 



n ^ n x ^ 



e n «» 

GeP{H,k) \h&H(G) 



< 



E 

GeV 2 (H,k) 

E 

G&V 2 {H,k) 



n 

K h&i(G) 

n ^ 

>h£H{G) 




h£H(G)\v£V{h) 

Xhv 

heH{G)\veV(h) 

Xhv 

h(=H{G)\veV(h) 



(2.9) 



where the last equality follows from the fact that E[X/j V ] = for all h £ H and v G V (/i). Below 



we will use the notation A t ,(G) 



E 



L/ie«(G)|«eV(h) 

Note that a labeled hypergraph G G V%{H.^ k) could have the number of vertices ranging from r] up 
to krj/2 since every vertex has degree at least two. For q, rj and Y clear from context, let S2(k,£) 
be the set of labeled hypergraphs with vertex set [£] having k hyperedges such that each hyperedge 
has cardinality exactly rj, total power q, maximal power < T, and every vertex has degree at least 
2. For each hypergraph G G S2(k,£) the vertices are labeled by the indices from the set [£] and the 
edges are labeled by the indices from the set [k]. Let M(S) for S C [£] be the set of all possible 
injective functions ir : S — > [n], in particular M([i]) is the set of all possible injective functions 
7r : [£] — > [n]. We will use the notation vr(/i) for a copy of hyperedge h = (V(h),r(h)) G T~L{G) with 
its vertices relabeled by injective function n, i.e. V(n(h)) = {tt(v) : v G V(h)} and t^m^) = Thv 
Analogously we will use notation n(G) to denote the graph G with vertices re-labeled according to 
function ir. We claim that 



e n -* n mg> 

GeV 2 (H,k) \hen(G) / \veV{G) 
k V /2 1 / 

= E^y E E II w -(h 

l=V ' G'eS 2 (k,e)TT&M([£]) \h&H(G') 



H A^)(vr(G / )) ) . 

,U6V(G') 



(2.10) 



Indeed, every labeled hypergraph G = (V(G),H(G)) G Vi{H, k) on £ vertices has £\ labeled hyper- 
graphs G' = (V(G'),'H(G')) G S2{k,£) that differ from G by vertex labellings only. Each of those 
hypergraphs has one corresponding mapping ir that maps its £ vertex labels into vertex labels of 
hypergraph G G V2 (H, k) . 



Then, combining ( |2.9| ) and (2.10) we obtain 

ferj/2 

! g(Y) k " 



1 



^E^ E E II w <v 

l=r, ' G'eS 2 (k,e)Tr£M([l}) \h£H(G>) 



H A 7r(u) ( 7 r(G')) I • (2.11) 

^nSV(G') 
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2.2 Estimating the term for each hypergraph G' 



We now fix integer t and labeled hypergraph G G S2(k,£). Let c be the number of connected 
components in G", i.e. c is a maximal number such that the vertex set V(G') can be partitioned 
into c parts Vi, . . . , V c such that for each hyperedge h G H(G') and any j G [c] if V (/i) fl Vj 7^ 
then V (/i) C Vj. Intuitively, we can split the vertex set of G 1 into c components such that there 
are no hyperedges that have vertices in two or more components. For each vertex v G V(G), we 
define D v = YlheH(G')\veV(h) Thv ^° ^ e ^ ne sum °^ an ^ e P owers that correspond to the vertex v 
and hyperedges that are incident to v. We call D v the total power of v. Let d v denote the number 
of hyperedges h G 7i(G) with v G V (/i). We will call <i„ the degree of the vertex u. By definitions 
X^eV(G') ^ = V k i X^eV(G') A> = 9 fe and d v > 2 for all u G V(G"). 

We consider a certain canonical ordering . . . , of the hyperedges in H(G') that will be 
specified later in Lemma (This ordering is distinct from and should not be confused with the 
ordering of the hyperedges inherent in a labeled hypergraph.) We iteratively remove hyperedges 
from the hypergraph G in this order. Let G s = (V' S ,T-L' S ) be the hypergraph defined by the hyper- 
edges H' s = h^ s \ . . . , and vertex set = UheH'V (h). In particular G[ is identical to G except 
for the order of the hyperedges. Let V s be the vertices of the hyperedge that have degree one 
in the hypergraph G' s , i.e. V^ +1 = V s \ V s . By definition, < \V S \ < r). For each vertex v G V s , let 
S v = T h ( s ) v , i.e. S v is the power corresponding to the vertex v and the last hyperedge in the defined 
order that is incident to v. We call 5 V the last power of v. Intuitively, we delete edges in the order 
, . . . , . We also delete all vertices that become isolated. Then V s is the set of vertices that 
get deleted during step s of this process. 

Recall [£} = V(G') and = V{G) \ U s t z\V t for s = 1, . . . , k. Then 



e n 1 1 n a ^mg')) 



E E 



n 

t'£M(v: +1 ) TreM(Vi) \heH' s+1 

s.t. 7r extends w' 



«V(/i) 

K h€H' s+1 



II K(v){k(G')) ] J io w ( fc wj JJ A 7r(u )(7r(G ,/ )) 



n vw we))) e ( ^(hW) n ^wg')) ) 



s.t. 7r extends tt' 



where we say that tt extends tt' if tt(v) = ir'(v) for every v in the domain of tt' . By Lemma 
which is an implication of the moment boundness of random variables (Section |2.5|) we have 



K(vMG')) < 



Y 



tt(v) 



r 7z(hM)TT(v)- 



2 d vL D v -5 v . n I . £ 



Y 



T 7r(h( s ))7r(i.) 
7r(l>) 
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since S v = T^kWWu) (the vertex degrees and powers do not depend on vertex labeling). Therefore, 



E 



ireM(V' s ) 
s.t. it extends n' 



n 

V&Vs 



veVs 



e ^(/.w) n e 



TreM(V^) 
s.t. 7r extends n' 



Y 



T 7r(h( s ))7r(ti) 

n(v) 



We now group the sum over ir by the value of ir(h( s ') = h €H. Note that for any fixed mapping 
vr' G M(Vg +1 ) there are at most |V^|! possible mappings 7r G M(Vg) that extend ir' and map the 
vertex labels of hyperedge h^ s ' G G' into vertex labels of the hyperedge h G Let \ V, denote 
but with vertices Vs removed, i.e. V(h^ \ V s ) = V(h^) \ V s and T h ( s )\y s)V = T h ( s ) v for all 
v G V(h^ \ V s ). Let h' = ir'(h^ \ V s ), which is the portion of ir(h^) that is fixed by it'. Recall, 
we write h >z h' if V(/t) 2 V(/i') and 77^ = Ty v for all t> G V(h'). Also recall the notation 6 V = T h ( S ) v 
for t> G 14. Then 



E 



^(/iW) n e 

s.t. 7r extends w' 



Y 



7r(v) 



< \v a \\ wh n E [i y - h "i 

h£H\hth' ueV(h)\V(h') 



max 



h'\q(h')=q-j: veVs 8 V 



e ^ n E i\ Y v hv 

h£H\hth' v£V(h)\V(h>) 



We repeat the argument for s = 1, . . . , k. In the end we obtain 



e n w «w n A *(vM G> )) 

ir€M{[£\) \heH(ir(G')) J \weV(G') 

e ( n ^w] ( n a ^)( g ')| < 



_ orfu tD v —S v . n I \ JiL / 

n - n(ww.<. 



2»?fc j^qk — A 



l«6V(G') 



i=0 
9 



n Tr h' n«' 



i=0 



(2.12) 



where u t is the number of indices s = 1, . . . , k with q — ^2 veVs 5 V = t, fit = Ht(w, Y), and A = 
J2veV(G') Ln the last inequality we used the fact that X^s=l 1^1 = ^ ana - 1^*1 —V- The quantities 
i/t must satisfy the equality YsLofa ~ l ) v t = X^eV(G') ^ = A - 
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The flexibility in the choice of the ordering , . . . , affects the quality of the bound ( |2.12| ) via 
its influence on the ft, S v and A. We focus on minimizing vq, which intuitively makes sense as no 
is often much larger than the other (it- The last hyperedge in each of the c connected components 
must contribute to vq, so uq > c. It turns out that equality vq = c is achievable; the intuition is 
to pick an ordering that never splits a connected component of G' s into several components. We 
defer the proof that such an ordering exists to Lemma 2.4 in Section 2.5. We also know that v q > c 



because the first hyperedge in each connected component is incident to vertices of degree two or 
more only and therefore contributes to v q . 

2.3 Using the Counting Lemma 

We assume that each hypergraph in S 2 has an associated canonical ordering of hyperedges, 



formally defined in Lemma 2A. This canonical ordering specifies the last powers for all vertices in 
the hypergraph and the values vt for t = 0, . . . , q. 

We decompose S<2(k,£) as S2(k,£) = (J c j >2 - q £, c, d, D, 5) where 2 is a vector of £ twos and 
S(k,£,c,d,D,5) is the set of vertex and hyperedge labeled hypergraphs with vertex set [£] and k 
hyperedges such that each hyperedge has cardinality rj, total power q, maximal power < T, the 
number of connected components is c, the degree vector is d, the total power vector is D, and S 
is the vector of last powers (corresponding to the canonical ordering). Note that S(k,£,c,d, D,5) 
depends on q, r/ and T as well. Let v = (uq, . . . , u q ). Combining, fl2.11|) and ( 2.12 ) we obtain 



E 



9<XT 



k v /2 e/n 
— ~i\ 



E 



=rj c=l d>2,D,S G'eS(k s i,c } d,D,5) 



n 

yt=0 



) v e n 



S v l 



< max _ { ^ • I • i. (2^f ■ \S(k, £, c, d, D, 5)\ • 2^L^ (f[ A rf ft ^ 



k£ 

< max < — 

£,c,d>2,D,5,9 2 • £\ 



. 2 H<lk+e) . 2 Vk L qk-A . . R qk _ ^gk-A-i . ^qk-{q-l)c-A+i . | ^ 



\S(k,£,c,d,D,S)\ n Tf r < this 
by counting Lemma g/lj (Section ^) 



^=0 




where the maximum over v is over uq, . . . , u q > with c = vq < v q , Ylt=o = k and ^2t=o(l~ tf v t = 
A. Also the integers i/q, . . . , v q must satisfy the inequality (q — l)k — A > (q — 2)vq by Corollary |2.6| . 
The second inequality follows from the fact that the total number of feasible total power vectors 
D is at most 2 qk+e (qk is the sum of all the powers and we need to compute the total number of 
partitions of the array with qk entries into £ possible groups of consecutive entries which is ( qk ^^ 1 ) ), 
and similarly the number of vectors d and 5 can also be upper-bounded by 2 qk+e . We substitute 
fo for c, and remove the unreferenced variables c, d, D and 5 from the maximum. The maximum 
over v is now over vq, . . . , v q > with v§ < v q , YH=o u t = k such that (q — l)k — A > (q — 2)vq. We 



1(3 



continue 



E 



g(Y) h 



< max 

Lv 



< max 
l.p 



kt 



2^{qk+l) . ypkjqk-A _ ^£ _ ^.gfc _ pjfc-A-£ _ y,qk-{q-\)v - A+£ _ j j~J 



^=0 



max < i? 



1 



< max ( R q 2 k L^ A ■ T qk ~ A ■ fctf-(ff-i)*>-A . j J 



(2.13) 



where i?o < R\ < R2 are some absolute constants, the second inequality uses the facts that 
l\ > (£/eY and F~ e < 1, and the last inequality is implied by the fact that 



krj 



kr] 

< max — 

x>0 \ X 



kri/e 



Inequality ( 2.13|) is precisely the inequality (|2.8| ) that we needed to prove. Note that the inequality 
(q — l)k — A > (g — 2)f implies qk — (q — l)f — A> k — uq = k — c > c > 1 (we use the fact that 
each connected component has at least two hyperedges). ■ 

2.4 Intermediate moment lemma 

Lemma 2.2 (Intermediate Moment Lemma) We are given n independent moment bounded 
random variables Y = (Y\, . . . , Y n ) with the same parameter L and a general polynomial f(x) with 
nonnegative coefficients such that every monomial (or hyperedge) h G % has exactly rj variables, 
total power exactly q and power of any variable upper bounded by T, i.e. q(h) = q, rj(h) = rj and 
T = mayL hen!veh T hv . Then 



E 



< max ■{[ \/kR q 3 Ti Li fi q jio) ,max(k t R q 3 L t T t n t ) k 



(2.14) 



where R3 > 1 is some absolute constant, g(Y) is a polynomial of centered random variables defined 
in Lemma \2.\ and [q] = {1, . . . , q}. 



Proof. We apply Lemma |2.l| . Since Ylt=o u t = k and Ylt=o( a ~~ fy v t = ^ we nave > 



9 -i 



q(vq ~ vq) + ^0 + ^ tu t 



t=i 



q-l 



(l{Vq ~ Vq) + qVQ + ^2 tVt 



t=l 



-(?- l)VQ + ^tU t 

t=0 

q I k - ^2 u t ) - (q ~ l>o + ^2 tVt 

V t=0 / t=0 

q 

qk-(q- l)z/ - ^(q - t)v t 
t=o 

qk - (q- l)u - A, 
qk - A. 
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Therefore, 



max ji?f L qk ~ A • H fc " A • fc9*-(9-i)«*-A . ^"Q^ ( 



( i-i 
= max ^ (k q R q T q L q fi q ) u ^ u ° ■ (kR 2 2 q T q L q fi q fi ) u ° • J]> <i? 2 r<L 
I t=i 

r 9-1 

< max ^ {k q R q Y q L q ^ q ) v ^ v ° ■ (kR q T q L q ^ y° • U^J^r^*^ 



for the absolute constant i?3 = i?^- Using the facts that Ylt=o u t = k and v q > vq again, we derive, 



E 



g(Y) k \ < max^^r^V/'" 1 " 1 • l^J kR q T<iLi fx q fi j ■ \\{k l R q T l L l ^ u 



2u 9-1 



< max{ {k q R q T q L q ^L q y ,[ JkR q Ti Li ^gno) , max (fc^fr^Vt) 

t&[q— 1] 



max <j ( ^kR q TiLifi q fi ) , max (fc* i?fr* L l fi t ) k 



2.5 Three Technical Lemmas 



In this section we prove technical lemmas that were used in the proof of Lemma 2.1 



Lemma 2.3 For any moment bounded random variable Z with parameter L, integer k > 1, set 
S C [k] and a collection of positive integer powers dt for t £ S, the following inequality holds: 



E 



< min 

teS 



L D -* ■ D\ ■ E [\Z\ l ] 



and 



E 



I(Z*-E[Z*]) 



.tes 



< min 



2 d L D ~ t ■ D\ ■ E. [\Z\ 
t\ 



where D = J2tes ^dt and d = J2teS dt- 

Proof. To prove the first inequality note that for any r G S by Jensen's inequality we have 



E 



n 

.tes 



< 



|E [Z D ] \<E[\Z 
L D ~ T D\ 



-E \\Z\ T ] 



(2.15) 



where the final inequality follows from applying Definition 1.1 D — r times. 
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To show the second inequality we bound 



E 



t~\\dt 



J(Z*-E[Z*]) 



tes 



< J ME |Z*-E[Z 



ti \<k- 



1,1 1 



< 



tdt/D 



tdt/D 



< n(E[2T- 1 (|Z| D + (E[|Z|*]) D / t 



tdt/D 



tdt/D 



< [](2TE[|Z 

tes 

= 2 d E[\Z\ D ] 

< ^-^E[|zn 

where the first inequality uses Holder's Inequality (see Lemma |3,1[) , the third inequality uses the 
fact (which follows from convexity) that (x + y) p < 2 p_1 (x p + y p ) for any p > 1 and x,y > (in 
particular (a; = and y = E [|Z|*]), the fourth inequality uses Jensen's inequality, and the last 
inequality uses the inequality (|2,15[) . ■ 



We now prove the following intuitive fact that was left unproven near the end of Section |2.2| . 

Lemma 2.4 In the notation of Section 2.i there exists a canonical ordering h^ l \ . . . of the 
hyperedges H(G') such that vq 



c. 



Proof. Let C be the line graph of G', i.e. an undirected graph with one vertex for each of the k 
hyperedges of G' and an edge connecting every pair of vertices that correspond to hyperedges with 
intersecting vertex sets. We define the desired sequence of hyperedges h^ , . . . , h^ and a sequence 
of induced subgraphs Ci,...,Ck of £ as follows. 

We set L\ to C. For any 1 < s < k we form £ s +i from C s by removing vertex h^ s \ where h^ 
has the lowest label from the vertices of C s subject to the constraint that the number of connected 
components n s +i of must not exceed the number of connected components n s of C s . For 

example pick h^ s ' to be an arbitrary leaf of a depth first search tree started from an arbitrary 
vertex of C s or an isolated vertex of C s if there are any. 

It remains to show that the ordering h^ l \ . . . , h^ satisfies the desired property uq = c. Note that 
contributes to uq if and only if V s = V(/i^), that is if and only if h^ is an isolated vertex in 
C s . Whenever such an h^ is chosen the number of connected components decreases by one (i.e. 



1), and otherwise the number of connected components is unchanged (i.e. n s +i 



n. 



We conclude that uq = n\ — = c — as desired. 



The following Lemma was used in the proof of the Initial Moment Lemma and will be used later 
in the proof of the Main Counting Lemma. 

Lemma 2.5 Let G 1 be a labeled hypergraph with all degrees at least two, c connected components 
with sets of vertices C\, . . . , C c and the number of hyperedges ki, . . . , k c . Further, let h^\ . . . , h^ 
be the canonical ordering of its hyperedges specified in Lemma £j where k = Y2i=i ki- Then for 
each i = 1, . . . , c we have 

(q - l)ki ~^5 v >q-2 
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where 8 V is the last power of vertex v corresponding to the canonical ordering of hyperedges as 



defined in Section JJLi . 



Proof. Fix a labeled hypergraph G' . Recall that the power of a vertex v in hyperedge h is denoted 
T^y. Following Section |2^2] define Vj to be the set of vertices whose last incident hyperedge is 



where "last" is relative to the canonical ordering h^ , . . . , of the hyperedges defined in that 
section. Let Ij = {s G [k]\ V(h^) C d}. We charge (q — l)/cj — YlveC ^ *° ^ ne various hyperedges 
in the following natural way: 



iq _ 1)h _j2s v = j2 (9-1)- E T < 



E 



(2.16) 



The contribution a s of the hyperedge with smallest index s G Ii is exactly q — 1 since the degree 
of each vertex v is at least two and h^ is the first hyperedge in this connected component that 
we delete, i.e. V s = 0. The last hyperedge h^ s ' for s' = max{j|j G ij} clearly contributes 
a s i = (q - 1) - q = -1 since \V 3 >\ = n. For any j £ h \ {s'} we know that YlveVj T hU) v < <7 ~ 1 
because \Vj\ < i] — 1. Otherwise component i would contribute more than 1 to vq and vq would 
exceed c. We conclude that ay > for j E Ii \ {s'}. Using ( |2.16| ) and these lower bounds on the ctj 
we bound 

(q - l)ki - ^^=l]aj>9-l + 0-l=g-2 
as desired. ■ 



The following Corollary immediately follows from Lemma |2.5| . 

Corollary 2.6 Let G' be a labeled hypergraph with all degrees at least two, c connected components 
and the canonical ordering satisfying the condition v§ = c we have qk — A > (q — 2)uq where 

A = E„ 6 v(G) S v = ELo(9 - ')"*■ 

3 General Even Moment Lemma 



Lemma 3.1 (Holder's Inequality) Letp\,...,pk G (l,+oo) such that Ef=i h = 1 then for 
arbitrary collection X\, . . . , of random variables on the same probability space the following 
inequality holds 



E 



i=l 



We will use the following corollary of Holder's inequality known as Minkowski inequality (or triangle 
inequality for norms). 



Corollary 3.2 (Minkowski Inequality) Let k > 1 and Z\, Z2, ■ ■ ■ , Z m be (potentially depen- 
dent) random variables with K[\Zi\ k ] < zf for Z{ G It follows that 



E 



Ei*i 



< 



E 

vi=l 



(3.17) 
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Lemma 3.3 (General Even Moment Lemma) We are given n independent moment bounded 
random variables Y = (Y\, . . . , Y n ) with the same parameter L and a general power q polynomial 
f(x) and maximal variable power T = maxhe"H,veh ^hv ■ Let k > 2 be an even integer then 



E 



\f(Y)-E[f(Y)]\ k } < maxjmaxfv^r'LVt/O ,max(fc*l^I*LVt) fc i • (3.18) 



where > 1 is some absolute constant. 

Proof. Let weight function w and hypergraph H = ([n] } 'H) be such that f(Y) = Ylhen Wh Ylvev(h) Y Thv . 
Let Xh v = YJ hv — E Let %' denote the set of all possible hyperedges (including the empty 

hyperedge) with vertices from V(H) = [n] and total power at most q. First we note that 

f(x)=j2^h n ( Xhv + E i Y v hv \) 

hen v£V(h) 

= e e w 4 n ( n ^ 

h'&W heU:h>h' \v£V(h)\V(h>) ) \veV(h') 

= e <> n Xh, v ( 3 - ig ) 

h'&i' veV(h') 

where h' ranges over all possible hyperedges (including the empty hyperedge) and 



w 'h> 



hen\ h>h> \veV(h)\V(h') 



We next group the monomials on the right hand side of ( ^.19[ ) by cardinality, power, and sign of 
coefficient, yielding m < 2q 2 polynomials g^' , . . . ,g^ m ^ with corresponding weight functions for all 
monomials w^ 1 ', . . . , u;( m ) and powers q±, . . . , q m . That is, 

m 

f(Y) = w' {} + J2 E w h> II X h'v (3-20) 
i=l h':ri(h')>l veV{h') 
m 

= E[f(Y)]+Y,9 (i) (Y) 



i=l 



where {} is the empty hyperedge. We have 



MrO (i) \Y) < Hr(w' \Y) = max Y \w' h ,\ E[|KT k, *|] 

* v ' h'^ho veV(h')\V(h ) 



* ™- e e w n i e ^ i n e 

u ' vv h'>h h>h' \v£V{h)\V(h') J veV(h')\V(h ) 

^ h ™_ E Eki f II E o^r h i 

h .q(h )-r h ^ ho h ^ hl \ ve v(h)\V(ho) 

< 21 max V \w h \ I TT E [\Y V \ T >">] \ = 2^ r {w,Y) = 2V 
hM=r h t h \veV(h)\V(h ) 
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where we upper bounded the number of hyperedges hf such that h >z h' >z. ho by 2 q . Therefore, for 

k 2 

g {l) (Y 



even k > 2 the Lemma I2J2] implies that 

k 



E 



E 



9 (i) (Y) k 



< 2 qk max { ( sJkRfrnLn^fj.o ) , max^ Rf LV fi t ) k 

i£ [Qi 



2 qk z k . 



Applying Corollary 3.2 yields 



E 



\f(Y)-E[f(Y)]\ k 



< E 



X>oni 



< 



2 qk \J2 zi ) - 29k 



k k 

m max z\ 



.1=1 



i6[m] 



< max i max ( J kR^L^tHo] , max(fc*i?|L*rVt) fc > 



for some absolute constant Rt± such that m 2 2 2q R q i < 

4 Proof of the Theorem |1.4| 



Now we prove Theorem 1.4 by applying the Markov's inequality. 
Proof. By Markov's inequality we derive 

Pr[\f(Y) - E [f(Y)\ I > A] = Pr[|/(y) - E [f(Y)} \ k > X k ] < 



fcl E[|/(Y)-E[/(Y)]| fc ] 



Choosing k* > to be the even integer such that k* € (K — 2, K] for 



K = min < min 



mm 



i.e. 



i/f 



< l/e and 



(fc*)*i?lL*rVt 



< l/e 



A ~ ' A 

for all i £ [g] . Using the inequality fl3.18| ) from the Lemma |3.3| we derive 

E[\f(Y)-E[f(Y)]\ k *] 



Pr[\f(Y)-E[f(Y)}\>\] < 



X k* 



k* In 



< max < max e 
e~ k ' < e~ K+2 



, max e 

te[q] 



i/t 



< e •max<maxe ^t^'mmo ; maxe ir l« 9 
for some universal constant R > R4 > 1. This implies the statement of the Theorem. 
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5 Counting Lemma 



In this section we consider labeled hypergraphs in S2(k,£) for fixed parameters 77, q and T. We use 
Ci, . . . , C c to denote the set of vertices in the connected components of a labeled hypergraph. We 
use i\ , . . . , £ c and k\ , . . . , k c to denote the number of vertices and hyperedges in those connected 
components. We will freely use the following elementary facts: 

1- V — — vki/2 where the lower bound follows from the fact that each connected component 
has at least one hyperedge and the upper bound follows from the fact that each vertex has 
degree at least two; 

2. r/c < £ < r/k/2 (these inequalities are obtained by summing up the above inequalities over all 
connected components); 

3. 1 < c < k/2, the lower bound is obvious and the upper bound follows from the previous 
inequality. 

In two of the auxiliary lemmas below we will use the classical Gibbs inequality which states that 
for two arbitrary discrete probability distributions p\ , . . . , p n and qi, . . . ,q n with strictly positive 
Pi , qi the following inequality holds 



^2 Pi log 2 Pi - ~ ^2 Pi log2 qi 



i=l i=l 

or equivalently 

n n 

In what follows we identify a vertex with its index v £ [£]. The main statement of this section is 
the following 

Lemma 5.1 (Main Counting Lemma) For any k, T, q > 77 > 1, £, c, D, d > 2 and 5 we have 



\S(k,£,c,d,D,S)\ [ II TT ) <Bfr qk - e -^m s ^k qk - c( - q - 1) ~^^ 



for some universal constant Rq > 1. 



We prove Lemma 5.1 as a sequence of auxiliary Lemmas. 

We say that C\, . . . ,C C and k\, . . . ,k c are feasible (with respect to d, D, 5 clear from context) if there 
is a labeled hypergraph in S(k, £, c, d, D, 5) with corresponding canonical oredring of its hyperedges 
whose connected components (numbered arbitrarily) have vertex sets C±, . . . ,C C and number of 
hyperedges k\, . . . ,k c . 



Lemma 5.2 For any k, £, q, rj, and d we have 

\u cA - sS (k,£,c,d,D,5)\ < (*:jV n ( <^ i)k kvk 

Note, that we intentionally do not fix V since the bound in the Lemma holds for any Y < q. 
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Proof. Fix k, £, rj, q, and d. To show the first inequality, note that a labeled hypergraph is uniquely 
specified by: 

1. for every vertex v = 1, . . . , I whether or not it appears in each of the k hyperedges and 

2. for every hyperedge h the power vector of its rj vertices. 

Vertex v with degree d v clearly has (j^) possible sets of hyperedges it can appear in, so there are at 
most n«e[£| (d ) wa y s to assign vertices to the hyperedges. In general this is quite a rough estimate 
(since we do not use the fact that each hyperedge contains exactly r\ vertices). More precisely we 
would like to estimate the number of 0/1 matrices of dimension kx£ with prescribed row sums equal 
to 7] and a column sum d v for a row indexed by v. Estimating this quantity is an important topic 
in combinatorics (see survey || and references therein) but for our purposes the simple estimate 
above provides a tight bound. 

We now count the ways to assign weights to the hyperedges. Recall that Th v denote the weight of 
the v th vertex in hyperedge h. There is a standard bijection between q — 1 digit binary strings with 
rj — 1 zeros and placements of q identical items into rj bins such that each bin has at least one item. 
The string starts with n h — 1 ones followed by a zero, followed by T2,h — 1 ones followed by a zero 

l k 

and so on, ending with r q h — 1 ones (and no trailing zero). We conclude that there are ( 9 Zx) wavs 
to assign weights of the hyperedges. This concludes the proof of the first inequality. 

The second inequality follows because (lZ.x) — 2 q ~ l , (J 2 ) < k dv /d v \, and Y2 V = Tjk. ■ 
Lemma 5.3 For any k, I, c, q, rj, d > 2, D and 5 we have 

i 5 (M,o,<?,AJ)i < (n cr *c„ (n^dci . 

\ve[e] ) ki,...,k c \' =1 

feasible 

where the maximums are evaluated over all Ci, . . . , C c and k±, . . . , k c that are feasible as defined 
above. This bound holds for any T < q. 

Proof. We prove the Lemma by mapping the labeled hypergraphs in S(k, £, c, d, D, 5) into distinct 
binary strings and bounding the length of these strings. 

Fix an arbitrary hypergraph in S(k,£,c,d,D,S). Our encoding begins by encoding the vertices in 
the connected component that contains vertex 1. Let the vertices in this component be denoted by 
C\ and \C\\ = l\. We encode l\ in unary, e.g. our string begins with 1110 if l\ = 3. We then encode 

the identity of the remaining £\ — \ vertices in C\ \ {1} using a single character with log 2 Q _-J 
binary digits, i.e. we have options which are encoded in binary. 

We then look at the lowest-indexed vertex that has yet to be placed in a connected component 
and encode the size £2 and vertices C2 of its component in the same manner. We repeat until all £ 
vertices have been placed in one of the c connected components, where the i th component considered 
has £i vertices. At this point we have partitioned the vertices into connected components using 



log 2 



£ 

- 1 



(5.21) 



bits. 



24 



We then encode the number of hyperedges ki in each connected component in unary using 



^(fci + l) = k + c 



(5.22) 



i=l 



bits, i.e. we have k\ ones followed by a zero, followed by &2 ones followed by a zero and so on. There 
are 



ways to partition k indices into c groups where i-th group has ki indices. Therefore, 

i ii 

fc! 



k X \:..-k c \ 

we can encode which component each hyperedge is in using 



log 2 



h \ ■ ... ■ k c l 



(5.23) 



bits. 



Finally we encode the vertices and the power vectors of each hyperedge in component 1 < i < c. The 
number of possibilities is clearly \S(ki,£i,l,d\i,D\i,5\i)\, where d\i (resp. D\i and 8\i) is the vector 
of degrees (resp. total powers and final powers) of the vertices in component i. Using Lemma 5^ 

- - - 2 qk ik nki 

we bound \S(ki,£i, 1, d\i, DU, <5L)| < ^ i — n . Therefore, the total number of bits used to encode 

llwSCj av - 

the hyperedges is at most 



^(niMEr)) 



(5.24) 



bits. 



Combining, ( J5.21 ) , ( |5.22j ), ( |5.23| ), ( |5.24| ) we obtain that the total number of bits used to encode an 
arbitrary hypergraph in S(k, £, c, d, D, 5) is upper bounded by the maximum over feasible Ci, . . . ,C C 
and fei , . . . , k c of 



i=l 



log 2 



1 



+k+c+ 



log 2 



kl 



fci!-...-fc c ! 



+ 



lo, 



A' 



r)ki 



\i=l 



which we'll denote by b. We can safely add trailing zeros so that each hypergraph is encoded using 
exactly b bits. We conclude that the number of hypergraphs is at most 2 b , which is at most the 
right-hand side of the Lemma statement, as desired. ■ 

Lemma 5.4 For any k, V , £, c, q > rj > 1, D, d > 2 and 5 we have 



< e O(qk) k qk-J2 vm (8 v -l) T qk-e-Y: vm S v TT 

Ci,...,C c A -L 
kx,...,k c 1=1 

feasible 



j fe .^(«-l)fci-E,60<( i »- 1 )-(l C i|- 1 ) 
1 



(5.25) 



Proof. Applying Lemma 5.S we get 
(n^f) \S(k,£,c,d,D,S)\ 



< max 
Ci,...,C c 
ki , • • .,&c 



A,! 



n 



(5.26) 
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We will now bound each factor of ( |5.26| ) in turn, making frequent use of the formula n\ = 
(n/ e) n n~°^ and the inequality (^) < n m /ml. 

First we bound 

2 i +k+5c+q k kl ^ e O{qk) k k_ ^ 

Secondly we bound 

n -PJ yr rr {Dy ~ d v )l ( D v " 
IA I " 1111 A I I 

« = 1 1=1 v£d 

C 

c 



< J [ D Dv ~^ v ~^ vJr ^2 Dv 
i=i veCi 



< JJ(-p;^.) , J fc i-^-E l ,6C l ( <5 «- 1 )2^i 



using the facts X^eC, ^ = X^eC*; ^ = ^ ano - ^« — — We fi nau y observe that 
(q - r])k - Ylve[ej( S v ~ 1) < qk - 2£ - Y^ve[£] S v + A yielding 

tZ-L) . 5 V ! . 

D = l 2=1 

For the third factor we consider two cases. If r] > 2 we have l{ > 2 and c < £/2 and we bound 

° / £ \ £4-1 ° e O(^)^-i 

S U - J - S w-ty - S (a - 1)4-1 



c 



C /> N -(4-1) 



1=1 



<e°W[I(|) (5.29) 



where the last inequality is Gibbs'. It turns out that ( |5.29D holds when r\ = 1 as well because every 
£i = 1 and hence both n?=i (^-i) anQ ^ lli=i (if) are ec l ua l to 1. 

Fourth we write 

JJ = e°^*) JI ^- 1)fci . (5.30) 

8=1 1=1 
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Combining ( ^T\ ), (|5T2§| ), (|53o| ) and (|]§) we get 



< max 

Ci ,..-,c c 

/ci . .... A', 



n 

i=l 



e O(gfe) fc fe r gfc-<-E„ eM «« max 

Ci r .,C< 



rw 



e O(gfc)^-E IJ GM( 5 «- 1 )r9fc-^-E t , eM '5« max TT 

d,...,c c J-i v fc 

l,...,fc c 



c / fc \ (5-l)fc i -E weO .(« -l)-(£i-l) 



Our final counting lemma bounds the optimization problem of Lemma 5.4. 
Lemma 5.5 For any k, I, c, q > rj > 1, d > 2, D, and 6 we have 

c v(9-l)fci-E, e0 .(«.-l)-(|C(|-l) 
rr / ru?, \ 1 



max 1 T ( — 

Ci,...,C c J-- 1 - I 
fci,...,fe c 1=1 
feasible 



Proof. We are looking to upper-bound 



< fc -(c-l)(<7-l) e O(«zA0. 



max TT of = 



(5.31) 



feasible 



where Zi = (q - l)ki - (\Ci\ - 1) - ^ veC -.(^ - 1) = 1 + (g - l)h - Ylv&d S v and = fej/fc. 
We upper-bound .M by the relaxation 



max I la/ such that 
ai,...,a c 11 * 

21,...,% i=l 



«i > 
z,: > a — 1 



(5.32) 

(5.33) 

(5.34) 
(5.35) 

(5.36) 



where Z = = c + (g — l)/c — £^ <5„. To show this is a relaxation we need to prove that any 

Zi feasible in ( |5.31 ) satisfies ( [5,36 ), which follows from Lemma |2.5| which states that (q — — 
^2 veC . S v > q — 2. Another implication of that lemma is that Z > (q — l)c. 
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When q = 1 we can trivially prove the Lemma by upper-bounding ( |5.32 ) by 1, so we hereafter 
assume q > 2 and hence every Z{ is strictly positive. For any fixed {zi}i, Gibbs' inequality implies 
that the maximum of ( |5,32| ) occurs when a, = z%jZ. Therefore we have reduced our problem to 



max 

Zl,...,Z, 



Zi \ ^ 



Z 



such that 



(5.37) 



n 

i=l 

% 

Zi>q-\. 

Clearly the optimum is when z% = q — 1 for all i 7^ 1 and z\ = Z — (c — — 1). Therefore the 
maximum of ( |5.37 ) is 



Z - ( c - l)(q - 1) y-Wil-V (q_i\ 



< 1 



(c-l)(«-l) 



jn (c-l)(9-l) (c-l)(ff-l) 
< jfe-(c-l)(«-l) e O(fffc) 



(5.38) 



using the fact that Z = '^ ti Zi — (<Z — l)c in the first inequality. 
We are finally ready to prove our Main Counting Lemma. 



Proof, of Lemma 5.1. Lemmas 5.4 and 5.5 give us 



< e °('J fc )fc 9fc ~^«eM( <5,; ~ 1 )r' 3fc ~^~^' uS M <5t 'A;~ c ^ _1 ^ + ^~^ 

< R^ k Y qk ~ e ~^ v ^W 5v ^ fc_c ('?- 1 )-E 1 , e [f]('5 I) -l) 

for some absolute constant Rq > 1 ( we used the fact that k q ~ l = e°^ fc ^). 

6 Permanents of Random Matrices 



Proof, of Theorem [1.5| Notice first that [it < ( n — t)l < n n ~ t and the power of the polynomial 
q = n. Since the permanent is a multilinear polynomial and E [Y^] = we can directly apply the 
Lemma [O] for k < n. Note also that n in this Theorem is the dimension of the matrix and not the 
number of random variables as in Lemma 2.1 (which is n 2 in this setting). We obtain 



E 



P(A) k ] < maxlR nk k nk - e ]_ 



n 



(n—t)v t 



V t=0 ) 

max {R nk k nk ~ V} = R nk k nk max j (~ 



< R nk k nk/2 n nk/2 . 
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We fix the deviation A = tynl > and choose k* to be the even number in the interval (K — 2, K] 
for K = y — . Using the Markov's inequality we derive 

W)I > A] < 

< e fc * ln ( V" < e - fc * <e~^+ 2 

(A/e) 2 /" , 2/n 

< e 2 • e < e 2 • e" c * , 

for some absolute constant c > 0. Note that condition k* < n is implied by the condition K < n 
which in turn is equivalent to the condition A < eR n n n . If A > eR n n n then we choose k* = n and 
estimate 

Pr[\P(A)\ > A] < Pr[\P(A)\ > eR n n n ] 

< !M1 <e -n 
(eR n n n ) k * ~ 



Proof, of Theorem 1.6 We have an n by n matrix A with entries that are independent except 
that the matrix is symmetric. The permanent P(A) is a degree n polynomial of independent 
random variables with maximal variable degree T = 2. (Note that the number of variables is 
(2) ^ n ' n °t n -) ^he permanent is a sum of products over permutations. We also treat each such 
permutation tt as a set of pairs for row index i and column index j. We write h(ir) for the 
hyperedge h corresponding to tt. More generally for any set S of matrix entries we write h(S) for 
the corresponding hyperedge. Note that because each variable appears in up to two positions in 
the matrix the mapping h is not a bijection. Clearly 

n n 

P ^)=EII^fl=E E IlAirW 

i" i=l h Tr:h(ir)=h i=l 

n y v ^ 

vev(h) 




= Wh 



As in the proof of Lemma 3J we write P(A) as the sum of polynomials g^ , . . . , g^ m ' with weights 
. . . , 7u( m ) and total powers qi, . . . , q m (in this case E [P (A)} = 0). 

Fix some 1 < i < m and hyperedge h' with q(h') = qi. The next step is to bound coefficients of 
polynomials gW , 

4 )= e^( n e ^ 

h^h' \v&V(h)\V(h') 

= e f n e k^' 

■K-.h{it)>h' \veV{h(w))\V(h') 

< e e ( n e cw 

S:h{S)=h' w.ttDS \v£V{h{n))\V(h') 
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where 5 is a set matrix entries. We can bound the number of such S by 2°^ qi ' since for each vertex 
(variable) v £ h' there are only two entries in the matrix A that can be mapped to v by the mapping 
h(S). Fix an S such that h(S) = U . Note that whenever th v = 1 we have E ["K^"] = 0, so we can 
restrict the sum to be over permutations tt with Th v = 2 for all v € V (h(ir)) \ V (h 1 ). For every fixed 
S, the number of such tt is at most n^ n ~ qi ^ 2 since we need to choose the remaining n — qi entries and 
each choice fixes two positions in tt. By moment boundedness we have fllyeV(7r)\V(fc.') ^ P'T 7 ' 1 ']) — 
2 (n-»)/2 i We conc i u de that w$ < 2°( n ) n ( n "*)/ 2 . 
Fix some with total power qi = q < n. We start from ( |2,11| ): 

krj/2 



E 



g ®(Y)" 



< 



E 



E 



E 



n 



w 



K(u)(G') 



e=r) G'es 2 (k,e.)ireM([£}) \hen(G>) 



.rteV(G') 



<{n 2 Y <(2°(")n(™-9)/ 2 )'= 



krj/2 ( \ 

^ E^T E (n 2 ) < (2°Wn^V 2 )* I 2«* J] D « ! 

^=J7 ' G'eS 2 (M) \ ueV(G') / 



k(n-q)/2 



e=r) 
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< OT* " k lk n k(n-q)/2 



(6.39) 

where the third inequality bounded 2<? fc | S 2 (£) \ U u D u } - by Rfk qk using Lemma |5.1| . Recall that 
max x> o (nk/x) x = e nk l e . We continue using the facts that we choose k < n and £ < 77/c / 2 < qk/2, 



E 



nk 



-) k qk n k{n ~ q)/2 



< R f k ^ n Kn- q) /2 

< Rf n lk/2 k qk/2 n k{n-q)/2 

< RS k n nk/2 k nk/2 . 



Applying Corollary [O yields 



E 



\P(A)\ h 



< E 



£k (i) oo 



< m k E% k n nk l 2 k nk l 2 

< Rfn nk / 2 k nk/2 

since m < 2n 2 < 10 n and where -R9 is an absolute constant. 

This gives us a bound on |E [P(^4) fc ] | identical to that in the proof of Theorem 1.5, so we finish 
the proof identically to the proof of Theorem |1.5|. ■ 



7 Examples of Moment Bounded Random Variables 

In this section we show that three classes of random variables are moment bounded and give 
examples from each class. The classes are bounded random variables, log-concave continuous 
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random variables, and log-concave discrete random variables. 



7.1 Bounded random variables 

Lemma 7.1 Any random variable Z with \Z\ < L is moment bounded with parameter L. 
Proof. For any i > 1 we clearly have \Z\* < L\Z\ l ~ l hence E [\Z\ { ] < LE < iLE [|Z| <_1 ] 



In particular Lemma 7.1 implies that 0/1 and -1/1 random variables are moment bounded with 
parameter 1. 

7.2 Log-concave continuous random variables 

We say that non-negative function / is log-concave if /(Ax + (1 — X)y) > f(x) x f(y) 1 ~ x for any 
< A < 1 and x,y G E (see |l5| Section 3.5). Equivalently / is log concave if ln/(x) is concave 
on the set {x : f{x) > 0} where ln_f(x) is defined and this set is a convex set (i.e. an interval). A 
continuous random variable (or a continuous distribution) with density / is log-concave if / is a 
log-concave function. See |j [6|, ||, |l^] for introductions to log-concavity. 

Lemma 7.2 Any non-negative log- concave random variable X with density f is moment bounded 
with parameter L = E [X] . 

Proof. Let i = inf{x > : f(x) > 0} and u = sup{x > : f(x) > 0}. By log-concavity we 
have that f(x) > for all I < x < u. Let F{x) = Pr [X < x] and F(x) = Pr [X > x). Note that 
F(x) = for all x > u. For any i > 1 we write 



E [X i 



x*dF(x) 



x=0 



>x=0 

-x l F{x) 

r-oo 



+ 



IX 





i-l 



x l dF(x) 
+ 

F(x)ix 
dx + 



x=0 j x=0 



F(x)d(x 

dx 
u F(x) 



ix l l f(x)dx 



(7.40) 



where the third equality is integration by parts. It is known (see for example implication B of 
Proposition 1 in || or Theorem 2 in ||) that log-concavity of density / implies log-concavity of 
F(x). It follows that <i(ln F(x))/dx = —f{x)/F{x) is a non-increasing function of x on (£,u) and 



hence F{x) / f{x) is also non-increasing. It follows that 



EM. 



IX 



i-l ; 



is a product of a non-increasing 
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function and a non-decreasing function. We apply Chebyshev's integral inequality, yielding, 

'■ u F{x) 



E [X] < I ix^dx + 
ix l ~ x dx + 



f(x)dx 



= e + 
< %m [x i_1 



F{x)dx 



/(*) 
F(x)dx 

■ iE [X 1 ' 1 ] 



ix l L f(x)dx 



iE [X" 1 ] 



+ 



F(x)dx 



iE [X*- 1 ] 



+00 



F(x)dx 



■ iE [X i_1 ] = E [X] ■ iE [X 



i— 11 



where we used the fact IE [\X\] = / +oc F(x)dx. ■ 

Lemma 7.3 Any log-concave random variable X with density f is moment bounded with parameter 
L = ^E[\X\]^1.UE[\X\]. 

Proof. If X is non-negative or non-positive with probability 1 the Lemma follows from Lemma 



7,2| , so suppose not. Write 
E \X\ k 



Pr [X > 0] E 
Pr [X > 0] E 



X k \X > 



+ Pr [X < 0]E {-X) k \X < 



XI 



+ Pr [X < 0] E 



X K 



where X + (resp. X_) is a non-negative random variable with density at x proportional to f{x) 
(resp. f(—x)) for x > and zero for x < 0. Clearly X+ and X- are log-concave. Lemma \I.2\ yields 



E 



\X\ 



\x A 



\k-l 



< Pr [X > 0] A;E[|X+|]E 

< fcmax{E[|X+|] ,E[|X_|]} [Pr [X > 0] E 
= k max{E [|X| | X > 0] , E [\X\ \ X < 0]}E 



+ Pi\X < 01 A;ENX_|1E 



ifc-i 



\x\ k ~ l \x > 



+ Pr [X < 0] E 



Ixi^Hx < 



l-Y 



k-l 



<k^E[\X\]E 



\X 



fc-i 



where we used Lemma in the last inequality to bound max{E [\X\ \ X < 0] ,E [|X| | X > 0]} < 

ihn\x\]. m 

The survey § lists many distributions with log-concave densities: normal, exponential, logistic, 
extreme value, chi-square, chi, Laplace, Weibull, Gamma, and Beta, where the last three are log- 
concave only for some parameter values. Lemma 7J3 implies that random variables with any of 
these distributions are moment bounded. 

Any random variable trivially satisfies E [(X) 1 ] = IE [|X|] E [jX) 1 " 1 ] so for every random variable 
(that is non-zero with positive probability) Lemma 7.2 gives the smallest possible moment bound- 
edness parameter L and Lemma |7.3| gives L that is within a factor of l/ln2 of the best possible. 



An exponentially distributed random variable is tight for Lemma |7.2| in an even stronger sense: 
E [|X| fc ] = kE [|X|] E [IX^" 1 ] for all integers k > 1. 
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The following example shows that Lemma |7.3| is tight. Let X have density 

„ I e( x ~ x °^ if x < xo 

f(x) = < 

I (J it X > Xo 

where xq = In 2. This density is clearly log-concave. Using integration by parts we derive 

+00 r \n2 



p+00 r 

E[\X\] = / xe- x - xo dx + 

Jo Jo 



+ OO f+OO e ~X x& 2 

+ / dx H 

,/n 2 ^2 



2 

0+l/2 + ln2-l/2 = ln2 



In 2 e x 

-dx 



The fc-th moment for large k is dominated by the exponential left tail: E J = e 3:0 A;! + O(l). 
Therefore limjfc_ KX3 ^^pzii = k = kj^K [\X\], hence X is moment-bounded for no L < y^E [|X|]. 

7.2.1 Technical lemmas 

This section is devoted to proving the following Lemma. 

Lemma 7.4 For any random variable X with log-concave density, Pr [X > 0] > and Pr [X < 0] > 
we have 

max{E[|X| I X < 0] ,E[|X| I > 0]} < — E[|X|] ps 1.44ENXI] . 

In 2 



The following Lemma about log-concave functions is intuitive but a bit technical to prove. 

e x ~ x ° if x < xo 
if x > xq 



I rco if X ^ Xo 

Lemma 7.5 Suppose f is a log- concave function, xq > 0, h(x) = < , /(0) = /i(0), 



and f(x)dx = h(x)dx. It follows that: 

1. there exists x\ < such that (x — xi)(f(x) — h(x)) > for any x < and 

2. (x — Xo)(f(x) — h{x)) > for any x > 0. 

Proof. Let S + = {x < : f(x) > h(x)} and S~ = {x < : /(re) < fc(a?)}. We have 

= f f(x)dx - [ h(x)dx = [ (f(x) - h(x))dx - [ (h(x) - f(x))dx 

J~oo J-00 Js+ Js- 

and an integral of a positive function is positive iff it is over a set of positive measure, hence either 
both S + and S~ have Lebesgue measure zero or neither do. Below we will use the following simple 
fact about concave functions. If g(x) is concave, g(z) = 0, g(z') > and z' < z then g(z") > for 
all z" £ (z, z). 

The log-concavity of / implies that In f(x) — In h{x) is concave on (—00, xq\. This and the fact that 
/(0) = h(0) imply the following key properties: if x £ S + then S + D [x,0) and similarly if y G S~ 
then S~ 5 (— 00, y]. Among other things these properties imply that if S + (resp. S~) is non-empty 
it contains an interval and hence has positive measure. If both S + and S~ have measure zero then 
f(x) = h{x) for all x < and any x\ < will satisfy the first part of the lemma. If both S + and 
S~ have positive measure then x\ = inf S + will satisfy the first part of the lemma. 
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The first part of the lemma implies that x<i = x\/2 < satisfies f{x2) > h(x<z). For any < x < xq 
the facts that ln/(x) — lnh(x) is concave on (—00, xq], /(x?) > h(x2) and /(0) = h(0) imply that 
f(x) < h(x). Clearly f(x) > h(x) = for x > Xq, so the second part of the Lemma follows. ■ 



Now we prove Lemma 7.4. 

Proof. Let random variable X be given with density /. We prove the upper-bound onE [|X| \ X < 0] 
only; the upper-bound on E [\X\ \ X > 0] follows from this bound because —X is log-concave. The 
Lemma is invariant with respect to scaling X so we can and do assume without loss of generality 
that /(0) = Pr [X < 0]. 

{gX—xo if x ^ Xq 
~ where xq is the solution 
if x > xo 

to e~ Xo = Pr [X < 0]. One can readily verify that Pr [X' < 0] = e~ x ° = Pr [X < 0]. Therefore, 
Pr [X' > 0] = Pr [X > 0], and h{0) = /(0). 



Using the first part of Lemma UjA we have 



r0 

0< / (x - Xl )(f(x) - h(x))dx 
-00 
r0 



I x(f(x) - h{x))dx - Zi(Pr [X < 0] - Pr [X 7 < 0] 
>/ — 00 

Pr [X < 0] (-E [\X\ I I < 0] + E \\X'\ \ X' < 0l) - 



i.e. E I X' < 0] > E [\X\ \ X < 0]. 
By the second part of Lemma Wl\ we have 



f+00 

0< / {x - x )(f(x) - h(x))dx 



r+00 

/ x(f(x) - h{x))dx - x (Pr [X > 0] - Pr [X > 0] 
Pr [X > 0] (E [|X| I X > 0] - E | X' > 0] ) - 



i.e. E [|X'| I I' > 0] < E [|X| | X > 0]. 

We conclude that 

E [|X|] _ Pr [X < 0] E [\X\ I X < 0] + Pr [X > 0] E [|X| | X > 0] 
E[|X| I X < 0] ~ E[|X| I X < 0] 

E[|X| I X > 0] 



= Pr [X < 0] + Pr [X > 0] 
> Pr [X' < 0] + Pr [X' > 
= Pr [X' < 0] + Pr [X' < 

= e~ xo + e~ xo 



E[|X| I X < 0] 
, E[|X| I X > 0] 
' ' E[|X'| I X' < 0] 
, Pr [X' > 0]E[|X'| I X' > 0] 
' Pr [X' < 0]E[|X'| I X' < 0] 

x + e- x ° - 1 



x + 2e~ X0 - 1 = r(x ) (7.41) 
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-•I'd 



where we used integration by parts to compute 
Pr [X' < 0] E [|X'| | X' < 0] 

Pr [X' > 0] E [\X'\ | X' > 0] 



Finally we note that r(xo) is minimized on < xq < oo when = = 1 — 2e x °. We conclude 
that 



/ 


(-x)e x - Xo dx 




</ — OO 






(-x) 


e x - xo \°-oo- f ("1 

•J — oo 


)e x ~ xo dx = + e 


l-XQ 


xe x ~ Xa dx 




Jo 






xe x ~ 


X %° ~ / e x - xo dx 
Jo 


= x + e~ x ° - 1. 



E[|X| 



E[|X| | X < 0] 



> r(x ) > r(ln2) = In 2 



which implies the Lemma. ■ 
7.3 Log-concave discrete random variables 

A distribution over the integers . . . ,p~2,P-i,Po,Pi,P2, ... is said to be log-concave || |27| if pf +1 > 
PiPi+2 for all i. An integer- valued random variable X is log-concave if its distribution pi = Pr [X = i] 
is. 

Lemma 7.6 Any non-negative integer-valued log-concave random variable X is moment bounded 
with parameter L = 1 + E [\X\]. 



Proof. The proof parallels the proof of Lemma 7.2. Let rj = Pr [X > i] = ^JL^Pj, £ = min{i : 
Pi > 0} and u = max{i : pi > 0}. For any k > 1 we have 



E 



\x\ k ] =E^ fc 

x=0 
oo 

= ^2(r x - r x+ i)x k 

x=l 

oo 

= Y^r x (x k -{x-l) k ) 

x=l 

oo oo 

< ^V^A;:)^" 1 = ^ r x kx k ~ l 

x=l x=0 

= y kx k-i + y r jL kx k-i px 

p. 

a 



x=0 



x=0 

< max 



kx=£ 



kx=£ 



{0,£- 1}E [fclXj^ 1 ] + (E[\X\] + 1-£)E k\X 



k-l 



(1+E[|X|])E k\X 



k-l 
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where the second inequality uses the fact that ^ is a non-increasing sequence (Proposition 10 in 
H) and Chebyshev's sum inequality. 



Lemma 7.7 Any log-concave integer-valued random variable X is moment bounded with parameter 
L = max(E[|X| | X > 0] ,E[\X\ \ X < 0]). 



We omit the proof of Lemma 7.7, which is almost identical to the proof of Lemma 7.3 



Examples of log-concave integer-valued distributions include Poisson, binomial, negative binomial 
and hyper geometric & [27]]. Random variables with these distributions are moment bounded by 



Lemma 7.6 



The parameter 1 + E [\X\] in Lemma |7.6| cannot be improved to match the E [|X|] in Lemma 7.2 . 
Indeed a Poisson distributed random variable with mean \x has E \X 2 ~\ = /i 2 + /U, which exceeds the 
desired bound of 2E [X] E [X 2 " 1 } = 2{i 2 when /i < 1. 



8 Examples Showing Tightness of the Bounds 

This section deals exclusively with multilinear polynomials with non-negative coefficients over in- 
dependent 0/1 random variables. We use notation specialized to this case: for a polynomial f(x) 
and 0/1 random variables Y\, . . . ,Y n we have 

Mr (/,y)= max \ Yl ^n E [ y 4- 

ACV: A \=r - LJ - I 

~ {hGH\ACh i£h\A ) 

We continue to omit the Y from fi r (f,Y) when it is clear from context. 

Lemma 8.1 We are given a power q± polynomial fi(x) with corresponding hypergraph Hi, weights 
wi, vertices V{Hi) = [n] and a power q 2 polynomial f 2 (x) with corresponding hypergraph H 2 , 
weights W2, vertices V(H 2 ) = {n + 1, ...,m}. We are also given m independent 0/1 random 
variables X±, . . . ,X m . Then the product polynomial fg = (H,w) defined by (fg)(x\, . . . , x m ) = 
f(xi, . . . ,x n )g(x n+1 , . . .,x m ) satisfies 

Hi{fg,X))= max Hi 1 {f,Xi,...,X n )m-i l {g,X n j r i,...,X m ). 

Proof. The Lemma follows easily from the definition of /ij and the fact that restriction to hyper- 
edges containing a fixed set of vertices preserves the product structure. Indeed let H\ = Ti(Hi), 
■H 2 = H(H 2 ) and % = U{H). Then 

Pi(fg) = max I V w h E [X v ] 

~ \hen\ACh v&h\A 



max max max < 

0<H<(?i,0<i2<92: A1CV1: A 2 CV 2 : 
h+i 2 =i |Ai|=ii |A 2 |=«2 



j2 n n e ^ 

vGh-tXAj v£h 2 \A 2 



hx&Hy. h 2 £H 2 : 
^AiChi A 2 Ch 2 



max 

0<ii<qi,0<i 2 <q 2 : 
il+i 2 =i 



\ 



I 



max V E \X V 

IA1M1 hi&Hr. v€.hi\Ax 
1 AiC/ii 



max V E [X, 

A 2 cv 2 : ^ 11 
\A 2 \=i 2 h 2 &T-L 2 : v£h 2 \A 2 
A 2 Ch 2 



} 

I 



0<i\<q\,Q<i 2 <q 2 : 
il+i 2 =i 
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Our next lemma studies a particular sort of complete multilinear (/-uniform polynomials that we 
will use frequently. 



Lemma 8.2 Given Z = ( *) = ^2hc[n]-\h\=qY[ v eh where the X v are independent 0/1 
dom variables with E \Xj\ = p < 0.5 we have 



ran- 



\p q ? < (np) q 1 and 



Pr 



Z=Q = {™)p c {l - p) n ~ c > e~ 2np (^) c for any integer < c < 



n. 



Proof. The first is immediate from definitions. The second follows because 



Pr 



Z 



Pr 



5> 



(jf)p c (l -p) n ~ c > (n/c) c p c e (n ~ c)ln(1 ~ l ' ) > (np/c) c e- 2n P 



where we used ln(l — p) > —2p for < p < 1/2 in the last inequality. ■ 

Lemma 8.3 For any q G N, < e < 1, A > /x* > 0, there is a non-negative power q polynomial 
f(x) and independent 0/1 random variables X\, . . . , X m such that 



ti j (f,X)<e«-in* q for all0<j<q. 

Pr [f(X) -E[/(X)] > A] > exp{-2e} ( Mx/ ^ )l/q ) 



Pr [f(X) - E [f(X)] > fi*] > exp {-26} 



9+1 



Proof. We pick f(x) = n* ■ E/cm,|/|= 9 ILe/ x i where \ M \ 
with probability e/m < 1/2. By Lemma |8.2| we have 



m 



\Aq(\/fi*) 1 / q ] and each X t is 1 



e 



??? 



The third part of the lemma follows from Lemma |8.2| with c = q + 1. Indeed = (7 + 1 > 2 > 

e q + 1 > (E [/] + /x*)//x*. Therefore, 



Pr [/(X)-E[/(X)]>/4] >Pr 



/PO = ^ 



> e 



-2c 



exp{-2e} 
c/ V q + 1 



9+1 



Towards proving the second part of the lemma choose c such that 



c- 1\ < A + E [/] < (c 



We have A + E [/] > fi* so such ac> 5 + I exists. Note that 

(c~l) q < /c- A < A + E[/] < A + /x| < 2A 
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hence 

c < 1 + q{2X/ l xZ) 1 l q < Aq(X/fj,*) 1 ^ q < m. 



By the second part of Lemma |8T2 we have 

f(X) = li\ 



Pr[/(X)-E[/(X)] > A] >Pr 



>e- 2 U-) >e~ 2 * 



4q(X/fi*.y/i 



The binomial distribution B(n,p) is the distribution of the sum of n independent 0/1 random 
variables each with mean p. The following lower bound on concentration of binomially distributed 
random variables is well known (e.g. |23|] has more general and precise bounds) but we include a 
proof for completeness in the Appendix. 

Lemma 8.4 For any fi > 27 and < A < fi there exists a binomially distributed random variable 
Z with E [Z] = fj, and 

Pr [Z > E [Z] + A] > e" 100 " 1 ^ . (8.42) 

Remark: The restriction that [i is bounded away from zero is needed since when fx = A <C 
1 the right hand side of (|8.42[) is constant and the left hand side is necessarily small because 
Pr \Z > E [Z] + A] = Pr [Z > 1] < E \Z\ = \i. 

Lemma 8.5 For any q G N, /i* > 0, /Xq > 27/i*, < A < [1q and < e < 1 there is a polynomial 
f of power q and independent 0/1 random variables X%, . . . ,X m such that 

• Mo(/) = Mo> 

• fij(f) < efi* for all 1 < j < q - 1, 
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• Pr[f(X) -E[f(X)] > A] > e- 100 e ^ . 

Proof. We fix a sufficiently large integer n such that < e and pick our polynomial f(X) 

to be essentially a linear function in disguise: 

f(X) = fi* ■ X qi+ i ■ Xg i+2 ■ ... ■ X qi+q 

0<i<n-l 

where Xi are boolean random variables with Pr[X{ = 1] = ^^^^ < e. Observe that n> q {f) = fi q , 
Mf) = Mo, and for 1 < i < q - 1 we have = < e ?~V* < e/J*. 

Observe that f(X)/fi* has the same distribution as a binomially distributed random variable with 
mean Hq/ n* q — 27. The lower-bound on Pr [f(X) — E [f(X)] > A] therefore follows from Lemma 8^ 
for sufficiently large n. ■ 

The following lemma shows how to use a counterexample polynomial of power less than q in place 
of a counterexample of power q. 
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Lemma 8.6 For any e > 0, q-uniform hypergraph H = (V,7i), non-negative weights w, polynomial 
f( x ) = J2heH w h Ylveh x v> independent 0/1 random variables X\, . . . , X n and any q' > q there exists 
a q' -uniform hypergraph H = (y',7i'), non-negative weights w' , independent 0/1 random variables 
X[, . . . , X' n , and polynomial f'(x') = J2h'eH' w 'h> Elie/i' x 'i suc -h that 

• Mi(/0 < f or al1 i < q> 

• Mi(/0 < e N(f) f or allq <i < q' , 

. Pr [f(X') - E lf'(X')] > A] > 2-(«'-9)Pr [f(X) - E [f(X)] > A]. 

Proof. We let f'(X,Y) = f(X)g(Y) where g(Y) is a power-(q' — q) polynomial that is well con- 
centrated around 1. In particular we use 

where the n' = (q' — q)m random variables are independent with mean 1/2 and 



m 



max( 2/e,2max^4^ 



Note that 2/m < e and 2/m < for any < i, j < q. It is easy to see that fii(g,Y) = (2/m)*. 



TLX) 



Let X' = X\, . . . ,X n , Yi t i, . . . ,Y q >_ q , m denote the random variables that /' is a function of. By 
Lemma |S.1| we get that 

Hi{f',X') = max a j(f)v>i-j (#) 

0<J<q-0<i—j<q 
0<]<g-0<i—j<q 

= max Bij (8.43) 

0<J<rnin(q,i) 

where = n j (f,X)(m/2y^\ We bound ( ^43|) in two cases. The first case is i < q. For any 
< j < i < q we have 

Bij = n 3 {f,X){m/2)-^ < H (f,X)(2/m) < MX). (8.44) 

Clearly B^ < fj,i(f,X) holds for i = j as well, so we conclude that max (j< j <min ^ q ^ B^ < fii(f,X) 
when i < q. 

The other case is when i > q. For any < j < q < i we have 

Mj (/,X)(m/2)-(^') < H (f, X){2/mf < ^(f,X) • ^yfr ■ « = W/,AT). (8-45) 

Similarly for < j = q < i we have 

ft (/,I)(m/2)-(-^ = / u (? (/,X)(m/2)-( 4 -«) < n q {f)(2/m) < e N (f,X). (8.46) 
Combining ( ^43| ), (|Q^ ), ( ^45|) and (|8T46|) we conclude that 



ihU',x')< 



e[iq(f,X), otherwise. 
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To show the last part of the lemma we bound 

Pr [f(X)g(Y) - E [f(X)g(Y)] > A] > Pr [f(X) - E [f(X)} > A and g(X) > 1] 

= Pr lf(X) - E lf(X)} > A] Pr [g(X) > 1] 
>Pr [f(X)-E[f(X)}>\] 2-(«'-«) 

where the last inequality follows because each of the linear terms (^ i* 1 the definition of 

g(Y) is distributed symmetrically about its mean of 1 and hence is at least one with probability at 
least 1/2. ■ 



Proof, of Theorem |1.3| . Fix q, A and {Ai*}o<j<g- Let i be the dominant term in (|L2|), i.e. i 
minimizes mm; (\ 2 /{fi* H*), (A/^*) 1 ^). We consider three cases. 

The first case is when A < fi*. In this case we apply Lemma |8^ (third part) to get a power i poly- 
nomial and then Lemma ^1] to convert it into a power q polynomial, using e = ^^o<j,j'<q / 1 1 *^ 
for both Lemmas. This yields 0/1 random variables X±, . . . , X n and the desired power q polynomial 
f(X) with (J,j(f,X) < fi* for < j < q and 

Pr \f(X) - E [f(X)] > A] > Pr [f(X) - E [f(X)] > Mi ] 

> 2-( ff_i ) exp |-2e + (i + 1) In (j^jj } 

> 2" 9 exp{-2 + (g + l)ln 



log Ci -(f>V VlJlogCi 

>max{e J ,e \ K ' J J } (8.47) 

where d = 2<?e 2 ((g + l)/e)<? +1 . 

The second case is when A > fx* and 27X 2 /(fi^fx*) > (X/fi*) 1 ^. We apply Lemma pT3| (second part) 
and Lemma 3J) using e = mino<j j'< g //* //x*, for both, yielding independent 0/1 random variables 
X±, . . . , X„ and a degree g polynomial /(X) with Hj(f, X) < fi*- for < j < q and 



Pr [/(X) — E [f(X)] > A] > exp j-2e + 4i(A/^*) lA m 

> 7^(C3)^ (A/ ^ ): 



4i(A/^)V« 



> J_(C 4 )- min (( A /K) 1/l :A 2 /(^M*)) 
C2 

> max < 



(^+l)log(max{C* 2 ,C 4 }) ~((^) +lj l°g(max{C 2 ,C 4 }) 



(8.48) 

for C 2 = e 2 2<?, C 3 = (A/^) 4 (f) 4i , and C 4 = Cf. 

The final case is when A > //* and 27X 2 < (X/fi*) 1 ^ 1 . These constraints imply that {Iq > 
27A 2 ~ 1 / i (/i*) 1/i ~ 1 > 27A, hence A < fa We also have fi* > 27A > 27yu|. We apply Lemmas |0 
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and Lemma ^6 with e = rnino<jj'< g (J>1j / fJ,j, for both, yielding a polynomial / and independent 0/1 
random variables Xi, . . . , X n with fAj(f, X) < /x* for < j < q and 

Pr [f(X 1 ,...,X n )>E[f}+\] > e- 100 2~^e-^ 

1 

" c 5 



> max < 



e V""' / ,e V V l/ / > (8.49) 



where C5 = e 100 • 2 q . This completes the case analysis. 

Let C = max{Ci, C2, C4, C5} < coA'f 1 A^Ag 3 for appropriate absolute constants cq, c±, C2 and C3 
where Ai = maxo<i J < g (//*//U*)' 3 = e -9 , A2 = maxi<,,< g A//i* and A3 = q q . The Theorem follows 
from ( p8|) and dg^j ). ■ 
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A Linear special case 



In this section we give a short proof of the linear case {q = 1) of Theorem L2. Concentration in 
this case was already known, but this special case nicely illustrates many of our techniques with 
minimal technical complications. In this case hyperedges are just single vertices, so to simplify 
notation we make no reference to hyperedges. The rest of the proofs appear in the full version of 
the paper. 

We have n vertices 1,2, ... ,n, independent random variables Y±, . . . , Y n that are moment bounded 
with parameter L, and weights w\, . . . , w n . We assume that E [Y v ] = and uu v > for all v £ [n]. 
We are looking for concentration of f(Y) = J2 v e[n] w vY v . Our bounds are based on the parameters 
Mo = E„e[n] l0 « E 0^0 and Mi = max ue[n] w v . 

Fix even integer k > 2. By linearity of expectation and independence we have 



E 



f( Y ) k ] = E ^ w Vk E[Y Vl Y Vk ] 

vi,...,v k £[n] 

J2 w -i w n II E N {ie[fcl: ^ =v}l l • (A.50) 



V!,...,v k e[n] ve{vi,...,v k } 



For conciseness we write the sum over v\, . . . , € [n] in ( A.50| ) as a sum over vectors v G 



k 



(with components vi, . . . ,Vk)- The sum over v in ( |A.50| ) is awkward to bound because it is very 
inhomogeneous, including e.g. both the case when v\ = ■ ■ ■ = Vk and the case that all the V{ are 
distinct. We deal with this issue as follows. Intuitively we generate v by first picking the number 
of distinct vertices £ = \{v±, . . . ,Vk}\ = \{v}\, secondly picking a vector u G [£] k (with components 
Ui, . . . , life G [£]) of artificial vertices, and finally choosing an injective mapping ir from the artificial 
vertices [£] into the real vertices [n] and letting vi = ir(ui). This process generates each vector v a 
total of l\ times since the names of the artificial vertices are arbitrary. Combining the above with 
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( A.50| ) we have 



E 



fon k ] = E j E E ^(uo • • • ^k) n E Kf lw} 

£=1 S6M fc :|{«}|=£ TreM(£) ue[£] 



(A.51) 



where M(£) denotes the set of all injective functions from [£] to [n]. We introduce the notation 
d u = d u (ui, . . . , Ufc) = |{z G [/e] : itj = u}\ for the power of Y^r u ) m ( |A.51| ). If any d u = 1 we 



have E 



7r(u) 



0, so we can limit the sum in ( |A.51 ) to the set 52 ifi) of vectors u E with 
<ii(?2), . . . ,di(u) > 2. The constraint that \{u}\ = t is clearly satisfied for all u E 52(^) so we can 
safely drop it. Note that ^«e[£] ^ = ^> so we therefore can reduce the range of I to 1 < i < k/2. 
Consequently we have 



E 



We bound (*) as 



k/2 



W =E^ E E w -*(ui 



ue[e] 



(*) 



w 



<u k ) II E [ Y n(u) 



(*) = E ■ 

< E ^(ui) ^k) n E [i F -wi du( " 



7reM(f) 

< e w -k) n Ldu_i • d - • E D y *(«)l] 



(A.52) 



(A.53) 



where the second inequality uses moment boundedness d u — 1 times (per u) and the third inequality 
follows because \x\ = max„ w v . We now extend the sum over 7r G M(^) in ( |A.53| ) (adding additional 
non-negative terms) to include all mappings from [£] into [n] injective or not, which enables us to 
move the sum over ir inside the product over u as follows: 

(*) < (^L) k - e ( n d u \ ] y. n ^(«) e [i^wG 

\«eM / 7r(l),...,7r(Qe[n]ueM 

n rf - n e^o^i] 



(viL) k - e l l[d u \\ *£w v E[\Y v \ 
\ue[£] J \ve[n] 

(^L) k - e ( n d - ] i 



(A.54) 
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Combining ( A. 52 ) with (|A.54 ) we get 

fc/2 / 

^f(Y) k ]<El E o^M 

£=i • «e5 2 W \ 



E^) fe -y E E ( A - 55 ) 

l=X d 1 ,...4 e >2:d 1 +---+d l =kue[e] k :d 1 (u)=d 1 ,...,d e {u)=d e \ue[£\ 



(t) 

where the equality groups the sum over u by the value of the d u . Now we claim that the equality 
(f) = k\ follows easily from either an appeal to multinomial coefficients or the following direct 
argument. Indeed consider k balls of which d u are labeled u for all u € [£]. Each of the k\ 
permutation of the balls induces a vector u of the labels. Every u S [£] k : d\(u) = di, ■ ■ ■ , dg(u) = dt 
is produced by exactly IlueM permutations of the balls, proving the claim. 

Substituting (f) = k\ into ( A.55|) , bounding the number of different d\, . . . ,d# > 1 with d\+- ■ -+di = 



k by 2 fe , and bounding k\jl\ < R^k k ji l < Rq ^max;>! k k e < R k k k e for some constants 
1 < Ro < Ri we get 

i k/2 1 
f{Y) k \<Y,jML) k -^ik\2 k 

i=i 

< (k/2)(2R 1 ) k max k k ~ e (uxL)^ ui 
te[k/2] 



E 



(A:/2)(2 J R 1 ) fc max {k^L) k ~ 2 \^ ^ Q k^L) n 

[A; / 2] 



a- 



< (max{4Ri£;/iiL,4Ri y 7 fiok^iL}) . (A.56) 
For any A > we choose k so that B ~ A/e and apply Markov's inequality, yielding 

Pr > A] < e" fc < e 2 max{e- A5 ! /(^ow) ) e -A/(i?Lw)| 

after some straightforward calculations (see proof of the main Theorem in the full version of the 
paper) for some absolute constant R. 

We now sketch the differences between the above linear case and the general case that is proven in 
the main body of this paper. In the general case the sequences of vertices vi,...,Vk and ui,...,Uk 
become sequences of hyperedges. The sums over u £ [£] remain sums over vertices. 

The biggest conceptual difference in the q > 1 case is that we consider the number of connected 
components in the sequence of hyperedges that replaces u±,...,Uk- Counting the number of se- 
quences of hyperedges with c connected components is substantially trickier than the above bound 
on (f). 

Bounding the equivalent of (*) by a product of various /i, is also substantially more involved. 



B Proof of the Lemma 18.4 



Let Y be Poisson distributed with E [Y] = /i, i.e. Pr \Y = i] = e _At ///z! for non-negative integers i. 
We will first show that Y satisfies ( |8.42| ) but with a better constant (96 instead of 100). We will 
then use a limiting argument to prove the lemma. 
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Let 5 = \/n < 1 and 5' = S + ^1 < 5 + 1 < 2. We will frequently use the facts that < 5 < 1 

and < 5' < 2 without explicit mention. Let f(t) = E [e tY ] and g a (t) = f(t)e~ at . We will use 
Theorem A. 2.1 in [|]] which states that 

Pr [Y > a - u] > e~ tu [g a (t) - e~ eu (g a (t + e) + g a (t - e))] (B.57) 

for any a,u,t,e £ R with u,t,e,t — e all positive. We choose these parameters as follows: let 
a = (1 + <5')/i, u = 3t/3i2, t = ln(l + 5') and e = (T+7) ~ 3~73T' ^°* e * na, t ^y concavity 

t = ln(l + 5') > - ln( 2 1+2) > • ^ > -7= > e hence i - e is positive as required. 

A standard calculation (e.g. Lemma 5.3 in Q) shows that /(t') = e M(e * -1 ) and <7 a (i') = e M< - et _1 ) _at '. 
Therefore 

ln[ 5o (t)] = M ( e ln ( 1+5 ') - 1) - M (l + 5') ln(l + 5') 
= fi(5' - (1 + 5') ln(l + 5')) 

> -fi5' 2 /2 (B.58) 

where the inequality follows from applying Taylor's theorem to the function h(x) = (x — (1 + 
x) ln(l + x)). We also have 

\n(g a (t ± e)) = m((1 + <5')e ±e - 1) - (^(1 + <5'))(ln(l + 5') ± e) 

< /i((l + ± e + e 2 ) - 1) - + 5'))(ln(l + 5') ± e) 
= - + ln(l + 5') + M(l + <SV 

= Hg a {t)] + 1, (B.59) 

where the inequality follows from Taylor's theorem and the fact that e ±e < e 3 ^ 5 < 2 and the 
last equality uses the fact that //(l + 5')e 2 = + ^(l/vM 1 + 5')) 2 = L (Inequality ( p759| ) is 
shorthand for two inequalties, one (resp. the other) with + (resp. — ) substituted for ±.) 

Putting the pieces together we get 
Pr [Y > (1 + 5)fi\ = Pr [Y > a - u] 

> e - ln(l+<5')3V3]S 



-e3V37l2 t 



<? a (ln(l + 5')) - e- e3 ^( 5a (ln(l + 5') + e) + 5a (ln(l + 5') - e)) 

> e^ ln(1+5 ' )3v/ ^ 5a (ln(l + 5')) 

> e - 5 ' 3v/37I 5 a (ln(l + 5')) [1 " e~ 3 2e] 

> e -5'3v^ e -M' 2 /2 [! _ e -3 2e ] 

where the first inequality uses ( B.57 ), the second inequality uses ( B.5S| ), the third inequality uses 



ln(l + 5') < 5' and e3^/3Jl = , 1 • 3v/3/i > 3 a/3/3 = 3, and the fourth inequality uses ( B.5§ ), 
Finally we bound 

6'3y/3jl + fi5' 2 /2 + 1 = (5 + 3 v / 3/V)3v / 3^ + K 5 + ^VV^f I 2 + 1 
= 3\/3x + 27 + x 2 /2 + 3\/3x + 27/2 + 1 

< x 2 /2 + 6V3x + 42 

< x 2 + (6\/3) 2 /2 + 42 = 6 2 fj, + 96 (B.61) 
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where x = S-Jfi. Equations ( |B.60| ) and ( |B.61| ) imply that 

Pr [Y > E [Y] + A] > Pr [Y > (1 + 8) ft] 



> e" 96 "^. (B.62) 



To complete the proof of the lemma we use a limiting argument. Let Z±, Z2, ■ ■ ■ be random variables 
where Z n has binomial distribution B(n, n/n). Straightforward calculation (e.g. Theorem 5.5 in 
|p0|| ) shows that lim n _ s>00 Pr [Z n = i] = Pr [Y = i] for any integer i (i.e. Z n converges in distribution 
to Y). It follows that lim^oo Pr [Z n > i] = 1 - lim^oo £}=o Pr [Z n = j] = 1 - Sj=o Pr i Y = j] = 
Pr \Y > i]. Consequently (choose i = + A]) there exists n' > such that |Pr [Y > fi-\- A] — 

yields Pr [Z n > n + A] > 

e _100_ ~, i.e. Z = Z n i satisfies ( |8.42| ). 



Pr [Z n i > fx + A] I < |e e p |. Combining this fact with (B.62 
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